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of  the  estimators  as  well  as  computational  aspects  are  investigated.  The  study 
is  oriented  toward  real  time  applications.  Application  of  maximum  likelihood 
to  the  above  four  cases  differs  from  the  classical  situation  in  statistics 
because  the  measurements  are  not  identically  distributed  or  are  not  independent 
or  both.  The  resulting  estimates  are  roots  or  cumbersome/ion  linear  equations.^. 
First,  the  likelihood  equations  of  each  of  the  four  cases  are  developed.  Gener- 
ally, they  are  found  to  be  expressible  as  polynomials  in  the  unknown  parameters. 
The  degree  and  complexity  of  the  polynomial  likelihood  equations  grow  with  the 
number  of  samples  upon  which  the  estimates  are  based.  The  theorems  on  minimal 
sufficient  statistics  by  Dynkin  shown  that  this  characteristic  is  unavoidable. 
Some  finite  sample  properties  of  the  likelihood  equation  root  corresponding  to 
the  maximum  likelihood  estimate  for  the  scalar  model  are  sought  through  averagir 
or  using  limiting  conditions.  As  the  measurement  noise  variance  goes  to  zero, 
the  maximum  likelihood  estimate  is  seen  to  approach  the  true  value  of  the  para- 
meter. This  conclusion  is  extended  for  stable  autonomous  systems  by  showing 
that  the  true  parameter  value  is  the  only  stable  root,  i.e.,  the  only  root  in 
(-1,1).  Simulation  results  indicate  this  extension  holds  for  forced  systems 
also.  A related  result  but  without  the  above  assumptions  is  also  proven.  In 
all  four  cases,  on  the  average  the  true  parameter  value  is  a root  of  the  corres- 
ponding likelihood  equation.  Also,  root  distributions  as  found  from  Monte  Carle 
simulations  are  displayed.  Consistency  is  shown  when  the  initial  condition  is 
known  and  the  system  is  stable  and  is  inferred  for  the  unknown  initial  conditior 
cases.  If  the  system  is  stable  and  autonomous,  the  proof  for  consistency  fails 
because  of  a loss  of  uniqueness  in  the  limit.  Numerical  solutions  for  the  esti- 
mates depend  on  such  factors  as  root  distribution,  likelihood  equation  sensitivit 
to  coefficient  perturbations,  and  shape  of  the  likelihood  function  derivative. 
Simulation  results  for  stable  systems  show,  that  at  least  up  to  moderate  noise 
levels,  while  the  likelihood  equation  has  multiple  roots,  generally  none  of  the 
other  roots  appear  to  lie  in  the  neighborhood  of  the  maximum  likelihood  estimate 
and  the  likelihood  function  derivative  is  relatively  smooth  and  insensitive  to 
noise  in  the  neighborhood  of  the  maximum  likelihood  estimate.  To  overcome 
continued  increase  in  the  number  of  computations  and  in  the  amount  of  storage 
required,  two  approximations  are  proposed.  In  one,  the  likelihood  equation  poly 
nomials  are  truncated  and  averaged  coefficients  are  used.  In  the  other,  a curv« 
fitting  scheme  is  presented  which  exploits  a recursive  aspect  of  the  likelihood 
function  derivative.  The  former  in  its  simplest  form  has  only  two  real  roots, 
one  in  the  stable  region  and  the  other  outside.  Under  certain  restrictions  it 
is  shown  to  yield  a consistent  estimate. 
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PREFACE 


Flight  vehicle  parameter  analysis  and  synthesis  requires — indeed, 
demands — most  effective  vehicle  parameter  dtermination  techniques  for 
many  reasons  including  vehicle  parameter  confirmation.  This  report  pre 
sents  some  of  the  most  powerful  results  developed  to  date. 
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SECTION  I 


The  experimental  determination  of  the  numerical  values  of  parent' 


eters  in  an  otherwise  completely  known  mathematical  model  of  a systi 


baaed  on  measurements  of  quantities  which  are  functions  of  the  par 


eters  is  generally  known  as  parameter  identification.  When  the 


measurements  are  subject  to  random  inaccuracies  (noise) , the  identifi 


cation  problem  can  no  longer  be  trivial.  Parameter  identification 


using  noisy  measurements  is  the  problem  of  estimating  a parameter  6 


whose  true  value  is  0Q,  from  a sample  (Jfj  , — ,xn)  assumed  to  have  been 
drawn  from  a population  having  a distribution  function  of  specified 
functional  form  F(x/0)  but  where  0 is  unknown  and  0 ,0Q  £ ©,  the  set 
of  atteissible  values  of  0. 


The  identification  problem  arises  in  the  development  of  mathemat 


ical  models  of  systems.  Frequently,  physical  laws  or  established 


empirical  relationships  exist  from  which  the  functional  form  of  the 


model  can  be  determined.  However,  physical,  engineering,  or  economic 


limitations  can  prevent  the  direct  measurement  of  certain  aspects  of 


the  system  required  to  completely  satisfy  the  model 


Because  the  mathematical  model  is  a common  tool  for  analysis  in 


many  fields,  the  identification  problem  is  similarly  widespread 


Specific  examples  occur  in  economics,  biology,  geology  and  engineering 


The  need  for  identification  in  engineering  often  comes  about  as  part 


of  a larger  problem  - optimvxn  or  adaptive  automatic  control  of  systems 


subject  to  stringent  performance  requirements.  This  situation  can  be 


found  in  such  areas  as  industrial  process  control  and  control  of  high 
performance  aircraft  and  aerospace  vehicles. 


fc 
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The  characteristic  of  identification  in  automatic  control  applica- 
tions that  tends  to  make  it  distinct  from  those  in  other  areas  is  the 
relatively  short  ti®»  in  which  the  identification  must  be  accomplished 
to  be  useful.  The  identification  problem  generally  takes  the  form  of 
completing  the  description  of  the  relationships  between  the  input 
states  of  the  plant  and  its  output  states.  The  parameters  to  be 
identified  usually  are  the  coefficients  of  equations  (difference, 
differential,  or  partial  differential)  of  the  plant  model  taken  as 
linear  with  constant  or  slowly  varying  coefficients.  Noisy  output 
measurements  are  assumed,  but  because  inputs  to  the  plant  can  often  be 
generated  with  considerably  less  uncertainty  than  exists  in  the 

measurement  of  the  output  states,  input  signals  frequently  are  taken  as 
known.  The  identification  is  carried  out  with  normal  operating  input 
or  with  no  more  than  minor  perturbations  to  the  input.  (If  no  restric- 
tions on  input  exist,  then  conceivably  the  identification  problem  could 
be  made  trivial  by  adjusting  the  input  so  that  the  output  signal  swamps 
the  measurement  noise.) 

The  number  of  techniques  available  for  identification  of  param- 
eters is  large.  Among  the  various  possibilities  that  are  statistical 
in  nature,  maximum  likelihood  is  often  considered  as  a standard  of 
comparison  largely  because  of  the  desirable  large  sample  properties  it 
typically  has.  To  use  this  method  sufficient  information  must  be 
available  to  determine  the  functional  form  of  the  distribution  of  the 
measurements.  Considering  the  joint  probability  distribution  of  the 
measurements  as  a function  of  the  unknown  parameter  6,  the  maximum 


likelihood  estimate  of  6 is  that  value  of  6 € 0 for  which  the  function 
is  a maximum.  The  estimate  is  usually  found  by  seeking  the  roots  of 
the  derivative  of  that  function  (known  as  the  likelihood  function) . 

The  literature  on  identification  in  control  systems  is  fairly 
extensive,  but  only  a relatively  small  portion  is  devoted  to  maximum 
likelihood  estimation  (probably,  because  of  the  fact  that  with  the 
exception  of  certain  special  cases,  the  evaluation  of  the  estimate 
can  be  a difficult  nisnerical  problem) . One  important  aspect  in  the 
application  of  maximum  likelihood  estimation  to  the  identification  of 
parameters  in  dynamic  systems  about  which  there  appears  to  be  little 
written  is  the  effect  that  various  levels  of  information  on  the  initial 
conditions  of  the  system  have  on  the  form,  properties,  and  ease  of 
solution  of  the  estimator.  The  primary  purpose  of  this  study  is  to 
investigate  these  effects.  The  basic  model  used  was  a linear  constant 
coefficient  difference  equation  plant  whose  output  measurements  were 
corrupted  with  additive  gaussian  noise. 

Chapter  2 discusses  in  considerable  depth  the  identification 
problem  as  it  arises  in  modeling  dynamic  systems  and  presents  an 
extended  review  of  the  pertinent  literature. 

In  Chapter  3 the  maximum  likelihood  estimators  are  developed  for 
the  basic  model  under  each  of  three  assumptions  on  the  nature  of  the 
initial  condition  - known,  unknown  parameter,  and  unknown  random 
variable  with  known  gaussian  distribution.  A fourth  situation 
involving  correlated  noise  and  based  on  an  equivalent  form  of  the  basic 
model  is  also  treated.  It  represents  an  extension  of  a much  earlier 
work  and  treats  the  initial  measurement  as  a known  deterministic 
initial  condition. 
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The  estimators  are  given  in  the  form  of  likelihood  functions  for 
scalar  and,  with  one  exception,  for  matrix  parameters  in  the  case 
without  plant  noise  and  for  only  scalar  parameters  with  plant  noise. 
Because  the  likelihood  equations  grow  in  complexity  with  the  number  of 
measurements,  the  question  of  existence  of  minimal  sufficient  statis- 
tics which  would  overcome  this  problem  was  examined. 

In  Chapter  4 am  analytical  investigation  of  the  properties  of  the 
estimators  of  Chapter  3 is  made.  The  characteristics  of  the  roots  of 
the  likelihood  equations  and  the  number  of  stable  roots  for  finite 
samples  are  investigated.  Large  sample  properties  of  the  estimates 
are  established.  Averaging  approximations  to  the  maximum  likelihood 
estimate  are  proposed.  Their  finite  sample  and  large  sample  properties 
are  also  discussed. 

In  Chapter  5 the  numerical  aspects  of  the  estimators  developed 
in  Chapter  3 are  treated.  Results  for  evolution  of  the  estimates  as 
the  number  of  samples  increases,  examples  of  the  functional  behavior 
of  the  derivatives  of  the  joint  distributions  of  the  measurements  and 
histograms  for  root  distribution  based  on  Monte  Carlo  simulations  are 
given.  Numerical  evaluation  of  the  roots  of  the  likelihood  equation 
and  a recursive  curve  fitting  approximation  are  considered.  The 
averaging  approximation  and  the  curve  fitting  approximation  are  simu- 
lated and  compared  to  maximum  likelihood  and  least  squares. 

In  Chapter  6 the  summary  of  results  and  conclusions  are  presented. 
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SECTION  II 

BACKGROUND 


Systems  identification  is  concerned  with  the  experimental  deter- 
mination of  a mathematical  model  to  characterize  a system  through  the 
use  of  measured  input-output  data.  Frequently,  this  problem  appears 
in  a form  where  the  only  unknown  aspects  are  the  numerical  values  of 
various  parameters  in  an  otherwise  completely  defined  mathematical 
model.  This  situation  is  referred  to  as  parameter  identification  (or 
parameter  estimation). 

Early  methods  of  parameter  identification  in  control  systems 
tended  to  be  ad  hoc  in  nature.  Later,  the  already  highly  developed 
identification  techniques  in  statistics,  and  in  particular,  maximum 
likelihood  estimation,  were  adapted  to  applications  in  control  systems 
problems.  Thus,  a complete  view  of  the  development  of  parameter 
estimation  in  control  systems  requires  an  appreciation  for  the  relevant 
contributions  in  the  field  of  statistics  as  well  as  in  control  systems. 

2.1  PARAMETER  IDENTIFICATION  IN  CLASSICAL  STATISTICS 

The  problem  of  parameter  estimation  in  classical  statistics  deals 
with  obtaining  a best  estimate,  in  some  statistical  sense,  of  a 
parameter  vector  6 on  the  basis  of  measurements  which  are  in  error. 
In  general,  the  model  takes  the  form: 

»i  - fl(S)  + (2.1) 

(The  term  "regression"  is  usually  reserved  for  estimation  with  this 
model . ) 
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2.1.1  MODEL  DEVELOPMENT 


The  basic  thread  that  runs  through  the  branch  of  statistics  that 
moved  to  the  point  where  it  had  essentially  direct  application  in 
control  systems  was  the  development  of  models  of  stochastic  systems, 
especially  in  time  series  analysis.  According  to  Parsen  [1961]  and 
Wold  [1954] , a series  of  advances  in  the  modeling  of  stochastic  systems 
was  made  starting  in  the  1920's.  Yule  in  1927  developed  the  scheme  of 
linear  autoregression  by  modeling  observations  rfc  as  linear  combina- 
tions of  previous  observations  plus  noise,  i.e., 

xt  ■ aJ*t-2  * •••  + Vt-ffl  ♦ et  (Z.2) 

where  m is  the  order  of  the  autoregressive  scheme  and  the  sequence  {efc) 
consists  of  independent  identically  distributed  random  variables.  Also 
in  1927 , Slutsky  introduced  the  notion  of  a moving  average  scheme  in 
which  observations  xfc  are  assumed  to  be  generated  by  a shifting  linear 
combination  of  members  of  an  independent  identically  distributed 
sequence  of  random  variables  i.e., 

xt  " aont  + alnt-l  + *••  + amnt.m  (2.3) 

The  theory  of  discrete,  random,  stationary  processes  emerges  with 
the  work  of  Kintchine  during  1932-1934.  Wold  in  1938  combines  the  work 
of  the  above  to  show  that  moving  average  schemes  and  autoregressive 
schemes  are  special  cases  in  the  theory  of  stationary  random  processes. 
Finally,  combining  moving  average  and  autoregression  schemes  yields  a 
model  that  is  closely  related  to  the  modern  linear  stochastic  control 
theory  model: 

xi+l  s &*i  + Bui  + “i 
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Vi  - + (2.4) 

where  the  measurements  are  y±,  and  the  (n i)  and  {u^}  are  noise  se- 
quences . 

2.1.2  CLASSICAL  METHODS  OF  PARAMETER  IDENTIFICATION 

A number  of  statistical  schemes  for  identification  exist.  Intu- 
itively, a generally  desirable  property  of  an  estimator  6 compared  to 
any  other  estimator  6 would  be  minimum  mean  square  error  in  an 
admissible  set, 

E(§-Q)2  < E(Q-Q)2  (2.5) 

A 

or  minimum  variance  if  6 and  6 are  unbiased. 


V(Q)  < 7(6)  (2.6) 

where  E is  the  expectation  operator  and  V is  the  variance  operator. 
Frequently,  identification  problems  in  control  systems  are  approached 
by  directly  seeking  a scheme  with  one  of  the  above  properties.  In  most 
of  the  remaining  cases  where  statistical  estimates  are  sought,  one  of 
the  classical  statistical  methods  is  chosen.  The  three  most  popular 
methods  appear  to  be  Gauss-Markov,  maximum  likelihood,  and  Bayes. 

The  Gauss-Markov  estimation  technique  applies  to  the  linear  model, 

y * H6  + e (2.7) 

where  the  expectation  E(e)  = 0,  E(eeT)  « R>0,  and  s is  assumed  to  have 
maximal  rank.  Minimization  of  the  cost  function, 

c = eTR -1  e 

leads  to  the  Gauss-Markov  estimate,  (2-8) 

0 - (grR~1B)~lHTR~ly  (2.9) 

If  R = a2I,  a2  a scalar  constant,  then  0 is  known  as  the  least  squares 
estimate.  The  Gauss-Markov  estimate  is  the  minimum  variance  unbiased 
linear  estimate  (Rao  (1965)). 

7 


Note  that  the  Gauss-Markov  estimate  requires  knowledge  of  the  first 
two  moments  of  the  probability  distribution  of  the  error  e.  A least 
squares  estimate  by  virtue  of  its  definition  can  be  used  with  no 
second  moment  information.  The  term  "least  squares"  is  often  used  to 
describe  more  general  minimum  mean  square  error  estimations  than  its 
strict  definition  would  encompass. 

An  interesting  bit  of  history  is  related  to  the  development  of  the 
method  of  maximum  likelihood  estimation.  Undoubtedly,  the  earliest 
major  milestone  in  parameter  estimation  occurs  in  the  works  of  Gauss 
where  he  places  the  method  of  least  squares  on  a rather  firm  foundation. 
Curiously  enough.  Gauss  apparently  used  the  principle  of  maximum  like- 
lihood to  accomplish  this  but  later  rejected  the  principle  as  a meaning- 
ful approach  to  estimation  in  its  own  right.  Edgeworth  [1900]  in  a 
translation  of  a letter  from  Gauss  to  Bessel  in  1839  reveals  Gauss 
stating , 

'"...That  the  metaphysic  emp'loyed  in  my  Theoria  Motus  Corp.  Coel. 
to  justify  the  method  of  least  squares  has  been  subsequently 
allowed  by  me  to  drop  has  chiefly  occurred  for  a reason  that  I 
have  myself  not  mentioned  publicly.  The  fact  is,  I cannot  but 
think  it  in  every  way  less  important  to  ascertain  that  value  of  an 
unknown  magnitude  the  probability  of  which  is  greatest-which 
probability  is  nevertheless  infinitely  small-rather  than  that 
value  by  employing  which  we  render  the  Expectation  of  detriment 
a minimum.* • ' 

However,  this  stigma  on  maximum  likelihood  estimation  was  finally  over- 
come in  1922  by  Fisher  (Berkson  [1956]). 

The  method  of  maximum  likelihood  estimation  requires  that  the 
probability  density  of  the  error  (or  noise)  be  known  at  least  within 
some  constants,  e.g.,  mean  or  variance.  Using  the  equations  of  the 
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modal  and  tha  distribution  of  tha  noisa,  tha  likalihood  function  L 
can  in  principle  ba  determined  where 

* P(Vi'”',VntB)  (2.10) 

For  a given  sat  of  measurements  yJf> •*,¥„.  the  value  of  the  unknown 
parameters  0 which  maximises  £ is  the  maxlmian  likelihood  estimate  of  6 . 

According  to  Rao  [1965],  [1952],  and  Finney  [1960],  maximun  likeli- 
hood (ML)  estimates,  9^,  have  several  desirable  properties  under  a wide 
variety  of  situations.  Among  them  are: 

1.  The  ML  estimate  is  consistent,  8^*  8 or  6^ — *-♦  8. 

2.  The  ML  estimate  is  asymptotically  efficient,  i.e.,  among 
consistent  estimators  it  has  minimum  variance  in  the  limit. 

3.  For  large  samples,  the  distribution  of  0^  becomes  normal. 

4.  If  L possesses  a sufficient  estimator  for  0,  then  0,  is 
sufficient. 

Berkson  [1956]  points  out  that  the  nice  properties  of  ML  estimates 
occur  in  the  limit.  This  asymptotic  information  is  not  necessarily 
useful  in  any  practical  situation.  Except  in  those  special  cases  where 
the  least  squares  estimate  and  ML  estimate  are  identical,  e.g.,  with 

! normal  distributions,  little  has  been  said  about  finite  sample  proper- 

ties of  ML  estimates. 

For  Bayesian  estimation,  some  a priori  information  on  the  proba- 
bility densities  of  the  parameters  8 in  addition  to  the  noise  densities 
is  required.  The  a posteriori  density  p(8/y)  is  found  from  Bayes'  Rule, 

p(8/y)  - p(y/*)p(*)/p(y)  (A.  11) 


Various  types  of  estimates  can  be  obtained  from  this  density  (Ho  and 
Lee[1964]  and  Stear  [1970]).  The  minimum  variance  unbiased  estimate 


, . .. « — - — 


t 


of  8 is  tha  conditional  naan  of  p(6/y) . Often  this  ia  difficult  to 
find,  and  inataad  tha  mode  of  p(8/y)  ia  uaad  aa  an  eatimate  of  8.  The 
latter  eatimate  ia  sometimes  known  aa  a posteriori  maximum  likelihood. 

2.1.3  MAXIMUM  LIKELIHOOD  ESTIMATION  OF  PARAMETERS  IN  DIFFERENCE 
EQUATIONS 

Prior  to  tha  development  of  modern  control  theory,  i.e.,  prior  to 
about  1960,  the  literature  contains  relatively  few  examples  of  general 
applications  of  maximum  likelihood  estimation  for  identification  of 


parameters  in  difference  equations.  Two  of  the  better  known  and  more 
significant  contributions  are  briefly  reviewed  below. 

Koopmans  [1937]  in  a monograph  on  linear  regression  investigates 
maximum  likelihood  estimation  of  regression  coefficients.  In  state 
variable  notation,  the  model  in  his  more  general  case  is  of  the  form: 

^ ’ *i  + (2.12) 

where  the  (nj } is  a sequence  of  sero  mean  independent  normal  random 
vectors  with  covariance  R, 

R - o2lrij] 

and  where,  u^T  - (0 ,• • • ,0 ,c)  (c  » scalar  constant) 

and  A is  a companion  matrix,  i.e., 

" I 

I 

0 | X 

A - ! 

I 

aT 

. — 

Using  maximum  likelihood  he  estimates  o2,  c,  x j , and  a.  (An  explicit 
expression  for  the  estimate  of  a generally  cannot  be  given  because  to 


i 
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find  the  estimate  an  eigenvalue  problem  must  be  solved.)  In  addition, 
he  looks  into  the  asymptotic  properties  of  the  estimates  and  the  sig- 
nificance of  ft  being  singular. 

Mann  and  Wald  [1943]  treat  a related  problem.  Their  model  in  the 
scalar  case  has  the  form: 

*t  " * a2*t-2  *•••+  a^t-p  + <*o  + et 


yt  - *t 


(2.13) 


where  the  regression  equation  is  assumed  to  be  stable  and  the  (et)  are 
independent  normally  distributed  random  variables  with  zero  mean  and 
variance  o2.  The  maximum  likelihood  estimates  of  aj , • • • #®p/00,o2  are 
shown  to  be  the  solution  of  a set  of  linear  algebraic  equations.  They 
prove  the  estimates  are  consistent  and  asymptotically  normal. 

The  second  half  of  the  paper  deals  with  the  more  general  case  of 
several  equations  in  several  variables,  i.e.. 


7*1  Jc« 


'ijlfj.t-k  * ai  * cit  1 ~ i'"*'r 


(2.14) 


or  in  state  variable  form  with  an  n x n companion  matrix  and 
assuming  the  matrix  | a ifj/Q  J is  non-singular 


n 


(2.15) 


where : 


SjT  * (0,'m’,0,aij)  , s2jr  ■ (<>,•••  ,0,1  j})  , n - max  p ^ 

1 rj 

A development  similar  to  the  scalar  case,  but  considerably  more  complex, 
is  given  for  this  case. 
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2.2  PARAMETER  IDENTIFICATION  IN  CONTROL  SYSTEMS 


Parameter  identification  in  the  terminology  of  stochastic  control 


systems  generally  refers  to  the  estimation  of  the  parameters  p in  the 
otherwise  known  functional  relations  (or  their  discrete  equivalent) : 


where  n and  £ are  unknown  random  variables  and  u,  t,  and  the  measure 


ment  y are  known.  In  the  linear  case,  a typical  form  would  be 


where  the  elements  of  p would  be  the  elements  of  the  matrix.  A and,  in 


addition,  could  include  the  elements  of  B and  H 


Occasionally,  the  parameter  vector  p also  includes  unknown 


as  a separate  problem  in  control  systems  except  in  those  situations 


where  identification  is  most  efficiently  achieved  when  carried  out 


jointly  with  state  estimation 


The  literature  on  identification  in  control  systems  has  been  quite 
extensive  and  varied.  Significant  numbers  of  publications  began  to 
appear  in  the  middle  and  late  1950's  and  have  continued  to  the  present. 
The  early  applications  naturally  tended  to  be  ad  hoc  in  approaches  and 
typically  were  concerned  with  estimating  the  impulse  response  of  linear 
systems.  By  the  early  1960's,  approaches  employing  more  powerful 


statistical  techniques  were  appearing  with  greater  frequency.  Within 

a few  years,  most  of  the  applicable  tools  of  the  statistician  seem  to 

Have  been  borrowed  by  the  controls  engineer. 


Before  launching  into  a review  of  the  literature  one  would  hope  to 
be  able  to  identify  categories  into  which  to  group  the  many  contribu- 
tions. There  seems  to  be  no  satisfactory  way  to  do  this.  However, 
for  this  discussion  two  major  groupings  serve  as  a guide  - those 
methods  which  are  based  on  least  squares  or  minimum  square  error 
criterion  and  those  which  are  statistical  in  nature.  Within  these 
classifications,  the  literature  can  be  grouped  by  specific  techniques. 
There  are  other  characteristics  which  could  be  used  such  as  (1)  the 
model  type  - linear  or  nonlinear,  continuous  or  discrete,  single  or 
multiple  input/output,  constant  coefficient  or  time  varying  or  (2)  the 
quantity  being  identified  - difference  or  differential  equation 
coefficients,  Laplace  or  Z-transform  coefficients,  or  the  entire 
impulse  response  or  (3)  restrictions  such  as  real  time  estimation  or 
use  of  only  normal  operating  input. 

2.2.1  MINIMUM  SQUARE  ERROR  METHODS 

Among  the  popular  early  methods  and  ones  which  continue  to  receive 
attention  are  numerical  deconvolution  and  related  impulse  response 
approximation  methods.  These  techniques  are  not  truly  parameter 
identification  methods  in  the  sense  of  the  definition  given  earlier. 
Their  objective  is  to  determine  seme  best  values  for  undetermined 
parameters  in  combinations  of  functions  which  are  intended  to  approxi- 
mate the  input/output  characteristics  of  the  actual  system.  (In 
nwerical  deconvolution  these  parameters  are  discrete  time  values  of 
the  impulse  response.) 


In  this  regard  Zabusky  (19S6)  works  with  the  convolution  equation 


l 


of  a linear  continuous  system 

x(t)  • f h(r)u(t~x)dr 

*0 


(2.18) 


and  seeks  the  value  of  the  system's  impulse  response  h(t)  at  discrete 
points  in  time.  He  approximates  the  impulse  response  by  products  of 
exponentials  and  polynomials  with  undetermined  parameters.  The 
parameters  are  fixed  by  minimizing  c where 


. -/’ 


(x(t)  - Xc(t)]2dt 


(2.19) 


and  xc(t)  is  the  output  of  the  approximate  model.  The  two  systems  are 
subject  to  identical  inputs. 

Goodman  and  Reswick  [1956]  perform  a similar  investigation  but  use 
delay  lines  and  recognizing  the  noise  problems  with  convolution  equa- 
tions. base  the  deconvolution  on  the  correlation  equation 

t 


♦ 


xu 


-/ 


h(t-T)*uu  (x)  dx 


(2.20) 


Goodman  [1957]  extends  his  previous  year's  work  to  multiple  input/out- 
put systems.  Taylor  series  is  used  with  the  convolution  integral  by 
Braun  [1959].  Orthogonal  filters  are  used  by  Elkind,  et  al.  [1963], 
Eykhoff  [1963] , and  Kekre  and  Glenski  [1968]  but  with  different 
approaches . 

The  model  reference  technique  is  similar  to  those  already  dis- 
cussed except  that  this  method  is  used  when  the  true  system  transfer 
function  is  known  but  for  some  or  all  the  coefficients.  Again,  the 
system  and  the  model  are  driven  by  the  same  input.  The  free  parameters 
in  the  model  are  adjusted  to  minimize  some  measure  of  the  differences 
in  their  outputs,  generally  the  integral  of  the  squares  of  the  difference 
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Probably  the  moat  frequently  referenced  paper  in  this  area  is  the 
one  by  Margolis  and  Leondes  (19591.  They  use  the  integral  of  the 
square  of  the  output  error  and  its  derivatives  as  their  cost  function 
and  by  a gradient  method  drive  the  coefficients  in  the  model  to 
minimise  the  cost.  Surber  [1963]  presents  a relatively  comprehensive 
investigation  of  the  model  reference  method.  Hsia  and  Vimolvanich 
[1969]  apply  the  technique  to  the  tracking  of  variable  parameters  in  a 
linear  control  system.  They  adjust  the  model  parameters  by  differen- 
tial equations  and  explicitly  account  for  measurement  noise. 

There  is  another  group  of  methods  that  is  perhaps  best  described 
as  least  squares  methods.  Included  here  is  Turin  [1957]  who  estimates 
the  impulse  response  of  a system  by  designing  a filter  whose  input  is 
the  output  of  the  system  and  whose  output  is  the  estimate.  The  design 
criterion  is  the  integral  of  the  square  of  the  difference  between  the 
true  and  estimated  responses.  (Techniques  like  that  of  Turin  which 
give  continuous  real  time  estimation  of  an  unknown  impulse  response 
often  are  referred  to  as  "matched-filter  identification"  (Gibson 
[1963]).)  King  [1967]  proposes  an  off-line  gradient  solution  to  the 
problem  of  identifying  system  parameters  subject  to  the  cost  function, 
the  integral  of  the  square  of  the  difference  in  true  and  measured 
output.  A similar  situation  but  with  a discrete  model  is  treated  by 
Aoki  [1967a]  where,  in  addition,  the  feasibility  of  estimating  only  part 
of  the  A matrix  when  the  A matrix  contains  seme  known  elements  is 
considered.  Dolbin  [1969]  also  deals  with  the  discrete  control  prob- 
lem but  has  unknown  parameters  in  A,  B,  and  H matrices.  He  develops  a 
gradient  type  solution. 
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The  equation  error  model  approach.  Figure  2 -la,  refers  to  the 
least  squares  regression  problem  where  the  system  is  represented  by  a 
difference  equation 

xi  * *lxi-l  Vi-P  ' Vi-2  +'**+  Vi-P  (2.21) 

and  the  coefficients  a^,  b k are  found  by  introducing  measured  (as 
opposed  to  true)  input-output  data  into  the  equation  and  minimizing  the 
resulting  error.  By  contrast,  the  model-plant  error  approach.  Figure 
2 -lb,  seeks  to  minimize  the  difference  in  measured  plant  output  and 
model  output  for  the  same  input  through  adjustment  of  ak  and  bk. 
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a.  equation  error  b.  model -plant  error 

Figure  2-1 

The  equation  error  model  problem  led  to  a series  of  papers. 

Kalman  [1958]  treated  the  equation  error  problem  without  noise.  He 
sought  the  coefficients  in  N and  0,  both  of  which  are  polynomials  in  z, 
in  order  to  minimize  the  sum  of  ej2.  (See  Figure  2-1.)  This  was 
accomplished  by  Gauss-Siedel  iteration  on  a set  of  equations  involving 
weighted  correlation  functions.  Later,  Steiglitz  and  McBride  [1965] 
solve  the  model-plant  error  problem  by  repeated  application  of  the 
equation  error  method  and  prefiltering.  Lion  [1967]  improves  on 
Steiglitz  and  McBride  by  introducing  filters  prior  to  N(Z)  and  D(Z) 
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in  the  equation  error  scheme  end  uses  a gradient  on  e2  to  derive  the 
polynomial  coefficients.  (A  special  version  of  this  solution  was 
given  earlier  by  Weygandt  and  Puri  [1961].)  The  Steiglitz  and  McBride 
solution  was  improved  upon  by  Schultz  [1968]  by  applying  quasilinear- 
ization to  the  model-plant  error  case. 

Lendaris  [1962]  and  Weygandt  and  Puri  [1966]  present  r atively 
complex  methods  for  finding  the  coefficients  of  the  plant's  transfer 
function  by  using  Z-transforms.  Solutions  of  sets  of  algebraic 
equations  and  roots  of  polynomials  are  required  to  do  this.  Because 
a differencing  scheme  on  the  system  output  is  used  in  this  method,  the 
estimates  are  likely  to  be  sensitive  to  noise.  Neither  of  the 
approaches  incorporate  any  smoothing,  but  they  could  be  extended  to  do 
so.  Hoppe  [1965]  has  a related  method  but  incorporates  integration 
for  snoothing. 

2.2.2  STATISTICALLY  ORIENTED  METHODS 

Cross-correlation  and  cross-spectral  mathods  can  be  used  to 
determine  the  impulse  response  of  linear  time  invariant  systems.  By 
observing  system  input  and  output  and  forming  auto-  and  cross-correla- 
tions, the  system  impulse  response  can  be  found  from  the  correlation 
equation  (2.20)  assuming  all  stochastic  aspects  are  stationary  and 
independent.  Goodman  and  Reswick  [1956]  and  Goodman  [1957]  have 
already  been  mentioned  as  early  examples.  Later,  Levin  [1960]  demon- 
strates that  a relation  exists  between  optimal  least  squares  estimates 
of  the  impulse  response  and  correlation  methods,  siting  the  Weiner  Hopf 
equation  as  the  link.  He  also  develops  a finite  numerical  deconvolu- 
tion scheme  which  uses  a discrete  version  of  the  correlation  equation 


and  sample  auto-  and  cross-correlations  to  yield  a least  squares  multi- 
point fit  to  the  system  impulse  response.  The  effect  on  optimality 
of  the  estimates  when  short  operating  records  are  used  is  investigated 
by  Kerr  [1961] . A comprehensive  discussion  on  correlation  techniques 
may  be  found  in  Akaike  [1967] . 

The  instrumental  variable  techniques  are  useful  with  linear 
regression  problems  (equation  error  models)  in  the  determination  of 
least  squares  estimates  of  the  regression  coefficients.  The  instrumen- 
tal variables  Z are  defined  as  an  additional  set  of  observations  with 
the  following  correlation  properties  (Goldberger  [1964] ) , 

plim  jj  Z^TV^(e)  — 0 (2.22) 

N-*» 

plim  jj  ZNrVN(x)  ■ P > 0 (2.23) 

N-»=° 

The  instrumental  variable  estimate  & of  the  vector  3 of  unknown 

n 

autoregression  coefficients  is  defined  as  (Hong  and  Polak  [1967]): 

Sw  “ (zNTvN(y))~lzNT*p  (2.24) 

where:  flr  * (a^,. . . ,ap) 

ZN  m appropriately  dimensioned  matrix 
of  instrumental  variables 

m (& j / • • • r*p) 

&rT  * (xt> — ,xN+r-l) 

«pT  * (up''--’uN+p-l> 

y±  " xi + ei 

aT2Cr  ■ uN+r-l 

The  main  advantage  of  instrumental  variables  is  that  they  always  yield 


j 

f 

? 
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consistent  estimates.  Reiersol  (1941]  is  generally  credited  with  being 
first  to  use  the  method.  A detailed  description  of  the  method  can  be 
found  in  Sargan  [1958],  and  an  informative  sunmary  is  given  by 
Goldberger  [ 1964] . 

Joseph,  et  al.  [1961]  use  the  input  to  their  linear  discrete 
system  as  the  instrumental  variable  when  applying  correlation  tech- 
niques to  establish  an  unbiased  estimator  of  the  Z-transform  of  the 
system's  plant.  Wong  and  Polak  discuss  the  application  of  the 
instrumental  variables  to  estimation  of  coefficients  of  a linear  auto- 
regression with  noisy  state  measurements , the  properties  of  instrumen- 
tal variables,  and  some  computationally  efficient  approximations. 

Since  some  freedom  exists  in  the  choice  of  the  instrumental  variable, 
Wong  and  Polak  explore  the  existence  of  optimal  sequences  of  these 


variables . 

An  approach  to  parameter  estimation  that  commonly  occurs  is  to 
augment  the  system  state  vector  with  the  unxnown  parameter  vector  p 
by  introducing  additional  state  equations  to  describe  the  dynamics  of 
the  parameters.  These  equations  typically  have  the  form, 

p - op  + 6 (2.25) 

where  8 is  either  a deterministic  or  a random  variable  but  often  both 


a and  6 are  sero. 

This  formulation  even  with  a linear  system  leads  to  a nonlinear 
arrangement  when  augmented  because  some  of  the  terms  in  the  system 
equations  by  definition  become  products  of  state  variables.  In  this 
situation  the  usual  approach  is  to  estimate  simultaneously  the  original 
state  variables  and  the  parameters.  State  vector  estimation  with  a 


nonlinear  nodal  is  known  as  nonlinear  filtering  and  typically  requires 
solution  of  multipoint  boundary  value  problems  to  determine  the 
estimates. 

A relatively  early  work  in  nonlinear  filtering  that  appears  to 
have  become  a basic  reference  for  much  of  the  work  in  the  area  is 
Bryson  and  Frasier  [1962].  The  model  used  is  basically  the  general 
nonlinear  one,  Equation  (2 . 16),  given  earlier.  The  objective  is  to  find 
the  minimizing  state  variable  function  x(t)  (which  for  the  identifica- 
tion problem  would  have  included  the  parameters  p(t))  for  the  cost 
function  Jt  where, 

Jt  - *(x(to)-v)'po  _1  (x(tQ)-u)  + i j (2.26) 

to 

subject  to  the  constraints  : 

x - g(x,u(t)  ,t,C)  * 0 (-?.  27) 

h(y,x,r\)  m 0 

and  where  p ■ E(x(t  )),  C • K-E((.),  and  fi  - n-E(n) . 
o 

The  minimizing  xft)  is  found  by  the  method  of  steepest  descent. 

Although  Bryson  and  Frazier  did  not  explicitly  concern  themselves 
with  parameter  estimation,  their  formulation  could  have  incorporated 
that  task.  In  fact,  most  nonlinear  estimation  schemes  could  handle 
parameter  estimation.  However,  for  the  most  part,  no  papers  were 
selected  for  the  following  discussion  which  did  not  mention  at  least 
some  connection  with  identification.  The  schemes  presented  divide 
into  two  groups  - those  that  are  patterned  directly  after  Bryson  and 
Frazier  but  use  some  other  modern  control  theory  solution  technique 
and  those  that  depart  along  the  way  by  linearizing.  These  papers  serve 
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to  illustrate  the  variety  of  ways  the  nonlinear  filtering  problem  can 
be  attacked. 

A number  of  solutions  of  the  first  category  use  quasilinearisation 
instead  of  steepest  descent.  Kumar  and  Sridhar  [1964]  having  measure- 
ments at  a number  of  discrete  points  solve  the  estimation  problem  as 
a multipoint  boundary  value  problem  using  p ■ 0.  Lavi  and  Strauss 
[1965]  show  that  if  the  total  number  of  measurements  (boundary  points) 
does  not  exceed  the  total  number  of  free  variables,  the  solution  may 
not  be  unique.  They  use  excess  measurements  and  perform  a least 
squares  solution  as  do  Kumar  [1965]  and  Lee  [1968a] . 

Detchmendy  and  Shridhar  [1965] , on  the  other  hand,  treat  the  prob- 
lem with  noise  on  input  and  output  by  deriving  the  Hamiltonian,  the 
canonical  equations,  and  then  solving  by  invariant  imbedding.  Lee 
[1968b]  derives  a least  squares  version  of  Detctmendy  and  shridhar 's 
solution  and  applies  it  to  a chemical  reactor  problem.  Cox  [1964] 
gives  a Bayesian  approach  with  a dynamic  programming  solution > and  a 
Hamilton-Jacobi  route  is  used  by  Mortensen  [1968] . 

As  a result  of  the  work  of  Kalman  [1960],  [1961]  which  led  to  the 
computationally  desirable  recursive  linear  state  estimation  equations, 
and  the  expanding  role  of  digital  computers,  aspirations  for  Kalman 
type  solution  to  the  nonlinear  filtering  problem  were  heightened. 
However,  generally  this  Kalman  filter  characteristic  can  be  achieved 
only  by  linearising  the  nonlinear  problems  about  some  nominal  (except 
in  very  special  cases,  Farison  [1967]).  Examples  of  recursive 
solutions  by  linearisation  are  Kopp  and  Orford  [1963]  and  Budin  [1969] . 
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The  method  known  as  stochastic  approximation  is  one  which  has 
considerable  appeal  from  a computational  point  of  view  because  of  its 
recursive  nature.  It  resembles  the  recursive  linearised  solution  to 
the  nonlinear  filtering  problem  in  that  it  too  represents  a linear- 
isation and  the  gain  (or  relaxation  factor)  generally  depends  on  the 
error  covariance.  It  differs  by  computing  only  the  unknown  parameters 
and  not  the  entire  augmented  state  vector.  The  recursion  equation 
has  the  general  form  (Balakrishnan  and  Peterka  [1969]), 

*N+1  m *N  * pN7a^*N^  ^2‘21 

where  is  a predetermined  relaxation  factor  and  Q(.)  is  the  equation 

error  at  the  Nth  stage  with  a taken  as  4^.  The  basis  for  convergence 
of  this  technique  rests  heavily  upon  the  proofs  in  Dvoretsky  [1956] . 

Ho  and  Whalen  [1963]  and  Ho  and  Lee  [1965]  develop  stochastic 
approximation  solutions  for  the  linear  discrete  model  and  show  con- 
vergence of  their  estimates.  Sakrison  [1967]  treats  the  continuous 
time  problem  with  the  equation  error  type  model  and  develops  an 
algorithm  for  identifying  system  Laplace  transform  coefficients. 

Saridis  and  Stein  [1968]  extend  the  work  of  Ho  and  Lee. 

In  spite  of  convergence  claims  of  the  above  and  others, 
Balakrishnan  and  Peterka  state  that  this  method  has  fallen  short  of 
expectations  apparently  with  slow  convergence  being  a major  difficulty. 
Albert  and  Gardner  [1967]  give  a comprehensive  discussion  of  stochas- 
tic approximation. 

When  appropriate  statistics  on  the  parameters  to  be  identified 
are  available,  Bayesian  estimation  techniques  can  be  used.  Unfortu- 
nately, using  this  additional  information  tends  to  result  in  estimators 
of  greater  complexity  than  found  by  other  methods  that  require  less 
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information.  Examples  of  Beyeaian  identification  can  be  found  in 
Sawaragi  and  Katayama  (1967),  Aoki  [1967],  and  Kroy  and  stubberud 
(1967) . 

In  the  situation  where  noise  statistics  are  available,  maximum 
likelihood  methods  may  be  employed.  (For  ML  estimates  of  the  usual 
a priori  type,  parameter  statistics  are  not  used.)  Frequent  claims 
are  made  in  the  literature  that  particular  solutions  are  maximum 
likelihood  estimates.  Often  these  are  least  squares  estimates  made 
in  a situation  where  maximum  likelihood  gives  an  identical  estimate, 
e.g.,  with  gaussian  noise.  The  papers  described  below  are  not  members 
of  that  category. 

The  problem  of  Koopmans  (1937)  is  adapted  to  control  systems  by 
Levin  (1964) . The  model  used  was  a discrete  single-input,  single- 
output equation  error  type  with  input  and  output  measurmnent  noise. 

To  achieve  independence  among  the  measurements  as  in  Koopmans,  Levin 
has  to  stack  his  measurements.  This  means  that  parameter  estimates 
can  be  updated  only  after  each  new  stack  has  been  accumulated.  However, 
using  the  stacked  measurements  he  arrives  at  the  eigenvalue  problem  of 
Koopmans . The  estimate  is  shown  to  correspond  to  a least  square 
hyperplane  fit  to  the  data.  Properties  of  the  estimates  and  estimation 
with  overlapping  stacks  of  measurements  are  discussed. 

Astrcm  and  Bolin  [1965]  estimate  the  coefficients  in  the  shift 
operators  a,  b,  c,  and  the  scalar  X in  the  system  2 transfer  function 

•r(Z~l)y< t)  - * Xc T{Z~b»(t)  ( 2.29 ) 

where  a(Z)  • 1 ♦ SjX  ♦ , etc. 

and  the  ie(t))  is  a sequence  of  normal  independent 
random  variables 
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To  maximize  the  likelihood  function  so  that  the  estimates  can  be  found 


a Newton-Raphson  algorithm  was  developed  which  made  use  of  symmetry 


among  the  partials  in  order  to  reduce  the  number  of  computations 


Astrom  [1967]  applies  this  solution  to  the  control  of  a paper-making 


machine 


Smith  and  Hilton  [1965]  review  the  characteristics  of  the  least 


squares  solution  of  the  error  equation  model  and  Levin's  generalized 


least  squares  eigenvector  solution.  In  1967  they  presented  the  results 


of  a numerical  comparison  of  the  two  methods.  They  found  that  the 


than  those  of  the  eigenvector  method,  but  their  standard  deviations 


were  smaller.  Also,  using  overlapping  vectors  in  the  eigenvector 


method  substantially  reduced  the  variance  of  the  estimates 


Rogers  and  Steiglitz  [1967]-  approach  identification  in  the  model 


error  formulation  by  passing  the  output  error  through  a whitening 


filter  whose  coefficients  are  estimated  along  with  those  in  the  model 


An  approximate  Newton-Raphson  algorithm  is  implemented  to  find  the 


Smith  [1968]  explores  the  problems  of  recovering  the  Laplace 


transform  of  the  system  transfer  function  after  forming  a sample  data 


estimate  using  Levin's  eigenvector  method.  Mayne  [1966]  presents 


various  on-line  algorithms  for  particular  regression  problems  but 


finds  that  in  the  more  general  case,  the  method  of  Astrom  and  Bohlin 


performs  best.  Kashyap  [1970]  extends  the  work  of  Astrom  and  Bohlin  to 


include  vector  state  and  input  variables  but  without  the  moving  average 


input.  The  model  includes  correlated  plant  noise  and  uncorrelated 


output  measurement  noise.  He  develops  algorithms  for  estimates  of  the 
system  coefficients  as  well  as  the  plant  noise  correlation  matrices. 

2.3  I MINT IFI ABILITY 

The  problem  of  under  what  conditions  can  a meaningful  estimate 
of  a parameter  be  obtained  as  well  as  the  whole  question  of  input 
selection  clearly  are  of  interest  when  developing  parameter  identifica- 
tion techniques.  Identifiability  refers  to  the  ability  to  excite  all 
the  modes  of  a system  and  being  able  to  observe  the  results  of  the 
excitation.  Input  selection  deals  with  input  signal  design  to  best 
facilitate  identification.  (If  the  identification  scheme  is  restricted 
to  the  use  of  normal  operating  inputs,  then  optimal  input  selection 
is  not  of  any  direct  interest.)  Astrom  and  Bohlin  [1965J,  Currie 
[1968] , and  Staley  [1968]  pursue  identifiability.  Turin  [1957] , 
Gagliardi  [1967] , and  Staley  deal  with  input  selection. 


SECTION  III 


3.1  nmoDocrioM 

The  mathematical  development  of  a maxima  likelihood  estimator 
for  the  identification  of  unknown  parasaters  in  tha  mathematical  nodal 
of  a ayatan  can  bn  viewed  as  a three  step  operation  once  the  nodal  ia 


completely  defined  (except  for  tha  anknown  paranetera) . Employing  tha 
usual  terminology,  a.g.,  see  Cranir  [1966] , first  the  likelihood 
function  I,  nust  be  determined.  The  likelihood  function  is  the  density 
function  of  ths  nsasursnents  considered  as  a function  of  the  unknown 
paranstsr.  It  oan  be  viewed  as  a family  of  probability  density 


functions  p(y/9)  on  the  sanples  (or  measurement*)  of  the 

system  output  lndarod  by  the  unknown  parameter  vector  e which  lies  in 

some  set  0. 

In  most  of  the  literature  on  maxima  likelihood  estimation,  the 


las  are  assumed  to  be  independent  and  identically  distributed 


resulting  ini 


6 • p(Vi*»»'*ynl9)  » p(yx /B)p(y2/6) . . .p(yn/9) 


(3.T) 


However,  in  the  following  discussions  not  all  these  conditions  are 
present,  and  less  restrictive  definitions  will  be  required.  For 


independent  but  not  identically  distributed  samples: 

*•  " Pi(yit9)P2(y2tB),..pJl(yni9) 
and  for  sanples  which  are  neither  independent  nor  identically 


(3.2) 


distributed: 


L - P(Vi*y2»‘”'Vn>*) 


(3.3) 


The  unknown  parameters  6 in  all  but  one  of  the  cases  to  be 
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ence  equations  in  the  models  investigated.  In  that  exception,  6 


includes  the  initial  conditions  as  well  as  the  coefficients,  but 


explicit  estimation  of  the  initial  conditions  can  be  eliminated  from 


The  second  step  is  to  find  the  necessary  condition  for  the 


The  necessary  condition  is  known  as  the  likelihood 


equation  when  defined  as 


Of  course,  the  logarithm  and  the  derivative  must  exist,  and  the 


maximizing  0 € © must  not  be  a boundary  point  of  the  set  ©.  (Because 


gaussian  densities  and  6€K,  the  real  line,  will  be  assumed,  these 


conditions  will  be  satisfied.)  Unfortunately,  when  0 is  in  fact  a 


matrix,  finding  the  likelihood  equation  may  not  be  straightforward  if 


As  will  be  shown,  the  likelihood  equations  for  the  situations 


treated  here  can  quite  naturally  be  expressed  as  finite  polynomials  or 


more  precisely,  as  sums  of  finite  polynomials  in  the  unknown  parameters 


The  desirability  of  expressing  the  necessary  conditions  in  that  form 


on  the  properties  of  roots  of  polynomials.  On  the  other  hand,  there 


are  well-known  problems  associated  with  the  numerical  solution  of 


roots  of  polynomials  which  must  be  faced 


The  third  step  is  the  solution  of  the  likelihood  equation  for  the 


appropriate  roots.  Normally,  this  can  only  be  done  numerically 


Since  sufficient  conditions  for  the  maximization  of  the  likelihood 
function  will  not  be  sought,  it  will  be  assuned  that  other  means  will 
be  available  to  determine  which  real  root  is  the  desired  one  if  multiple 
real  roots  exist.  (The  solution  of  the  equations  is  the  topic  of 
Chapter  S.) 

3.2  PROBLEM  STATEMENT 

The  basic  mathematical  model  used  for  this  study  is  the  usual 
linear  discrete  control  system  model  but  without  plant  noise.  This 
model  can  correspond  to  a local  approximation  to  a more  complex 
nonlinear  system.  The  measurement  noise  is  treated  as  additive  and 
assuned  to  be  white  on  the  basis  that  in  practice  its  bandwidth  is 
frequently  found  to  be  much  wider  than  that  of  the  plant. 

The  model  is  represented  as  follows: 

Plant:  • Ax±  + Bu± 

Measurement:  yi  - HXj  + r\d  i - 0,1,... ,R  (3,5) 

where : 

Xj  " n -dimensional  state  vector 

y±  ■ m-dimens iona 1 measurement  vector 

u_j  ■ r-dimensional  input  vector 

“ m-dimensional  measurement  noise  vector 
with  , E[i]iT)jr]  - Riij,  R>0 

A • n x n constant  matrix 
B m n x r constant  matrix 
H m m x n constant  matrix 

The  following  are  known:  the  dimension  of  all  the  quantities,  the 
matrices  B and  H,  and  at  time  tN,  the  variables  Uj  and  y^. 
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I 1 

i - 0,1,..., N.  The  unknowns  are  the  matrix  A and  the  variables  xi  and 
r\it  i m 0,1,..., N.  (The  statistical  properties  of  are  assumed  to 

be  known.)  The  objective  is  to  estimate  A using  the  output  measure- 
ment {y^}  and  the  input  sequence  {u^}.  (See  Figure  3-1.) 


1 

1 

“i  rA 

A *i+l  J 

m 

Xi 

MLE 

y *i 

LU 

□ 

Basic  Model 
Figure  3-1 


The  matrix  A is  taken  to  be  general  except  in  certain  analyses, 
when  noted,  it  is  assumed  to  be  stable.  The  model  is  assumed  to  be 
observable  (in  the  deterministic  sense),  i.e.,  the  n x mn  matrix 
[Hv ,ATHr , . . . , (Ar)n  *HT J has  rank  n.  Further  restrictions  such  as 
canonical  forms*,  e.g.,  see  Lee [ 1964] , are  not  considered  here. 

The  type  of  estimator  of  A that  is  sought  is  one  which  is  based  on 
maximum  likelihood  but  operates  in  real  time,  requires  a fixed  and 
minimal  amount  of  data  storage  and  computation,  and  can  function  with 
only  normal  operating  input.  Separation**  of  identification  from  other 


* Though  going  to  an  equivalent  companion  matrix  system,  for  example, 
reduces  the  mxnber  of  parameters  to  be  identified,  it  is  not  clear  that 
attempting  to  recover  the  A matrix  from  an  equivalent  system  will  be 
any  less  difficult  than  directly  estimating  A.  While  in  some  situa- 
tions knowledge  of  the  equivalent  system  might  be  sufficient,  availa- 
bility of  the  A matrix  for  filtering  and  control  is  often  required. 

**  A separation  theorem  for  the  filtering  aspect  of  the  control  problem 
does  exist,  Joseph  and  Tou  [1961] , but,  apparently,  with  the  exception 
of  only  a few  special  cases,  Horowitz  and  Grammaticos  [1970] , one  has 
not  been  found  for  identification,  Sawaragi  and  Katayama  [1967] . 

| 
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aspects,  in  terns  of  the  total  control  problem,  is  assuned  to  be 
allowed. 

The  nature  of  the  a priori  information  on  the  initial  condition  x 

o 

significantly  affects  the  form  of  the  estimator  of  A.  Three  possibil- 
ities are  examined:  xQ  known,  xQ  unknown  parameter,  and  xQ  unknown 
random  variable  with  known  gaussian  distribution.1  A fourth  case  which 
is  related  to  the  first  two  but  based  on  a different  approach  is  also 
discussed.  In  the  terminology  established  in  Chapter  2,  the  first 
three  cases  are  similar  to  the  model-plant  error  formulation  while  the 
fourth  is  similar  to  the  equation  error  formulation. 

3.3  THE  NECESSARY  CONDITION  - xQ  KNOWN 

In  this  case  the  initial  condition  xQ  is  assuned  to  be  known.  The 
maximum  likelihood  (ML)  estimate  corresponds  to  finding  a "best"  fit  of 
the  solution  of  the  model  equations  to  the  system  output  measurements. 
The  likelihood  equation  for  the  scalar  model  will  be  developed  first, 
followed  by  the  vector -matrix  case. 

For  the  analysis  of  the  scalar  problem,  notational  changes  are 
made  in  the  model  as  given  in  Equation  (3.5).  The  scalar  model  is 
written  as: 

*1+1  * **i  + b“i 

yi  “ + ni  * “ 0,1,..., N (3.6) 

where:  ^TJ(0,az) 

Since  the  n j are  gaussian  and  independent,  the  distribution  of  y± 
follows  directly  from  (3.6)  as: 

yd  ^ T)(baix0  + hb  *1' 1 m 1,2,..., N (3,7) 
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The  maximun  likelihood  estimate  of  a,  denoted  by  a,  is  the  one  which 
maximizes  p(y2 , . . . ,yN;a)  or,  equivalently,  the  one  which  minimizes: 

N i 


0N(a)  = (Vi  ~ &lhx0  ~ hb  J? 

i=l  k*l 


i-k  , 2 
a uk-l} 


(3.8) 


As  expected,  the  estimate  will  be  a form  of  least  squares.  It  is 


independent  of  the  noise  variance  and  the  first  sample  yQ  (since  xQ  is 


known) . 


dQN(a) 

Forming  — — — = 0 and  rearranging  terms  give  the  necessary  con- 


da 


dition  as: 


N 


N i 


Xf  iai_lyi  * b J2  ^ Uj-u/i 


N 


- g « 


2i-l 


S i 


hbx „ 


- hb2- 


N i i 


£££  uj-l uk-l  * 0 


(3.9) 


where  a is  the  appropriate  value  of  " a " which  satisfies  (3.9) . For  the 
autonomous  case  ( u ^ = 0)  , the  necessary  condition  reduces  to: 


N 


N 


Siai~ly^  - hx0  ia2i_1  = 0 
i=l 

When  A is  an  n x n matrix,  the  density  of  y ^ is: 

i 

yj  ^ T)(HAix0  + H ^ Ai~jBuj.1,R)  i = 1,2,...,N 


(3.10) 


(3;11) 


and  the  cost  function  for  the  maximun  likelihood  estimate  becomes: 


N 


On  (*) 


| | Pi  - - H ^ Ai 


| I* 


(3.12) 


Taking  differentials  at  a stationary  point  gives: 


i 


1 
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a jl 

Ufa  (A)  - 2 ^ [(y*  - HA^XQ  ~ H £ Ai-JBuJ_1)TR~l(-H(AA*; 
i 

- H g fAA^jBUj^jJl  - 0 


(3.13) 


where 


A AP 


P 

- AP'1  (LA)AP~r 


By  rewriting  (3.13)  as  an  equation  in  the  traces  of  matrices  and 
using  the  ccmmutivity  of  matrices  under  the  trace  operation,  an 
equation  in  the  trace  of  the  product  of  two  matrices  of  the  following 
form  can  be  derived: 

0N(a)  = tr  [(D) (bA)]  = 0 (3.14) 

Since  the  differential  matrix  AA  is  arbitrary,  then  D equals  zero,  and 
the  necessary  condition  can  be  expressed  as: 

N i 

g = Tj  yj  A2"*x0yiTJ?"lIHP”1 


P~ 

N i 

2 

Pm 

Nii 


Sf]  Ai~Px0X0T(AT)  iHrR~1  HAP-1 
p=I 

Ai"Px0uJI2BT  (Ar)i“JnrR~1  HAP'1 


N i-1  i-k 

+ 2 E I!  Ai-k-PBuk_1yi'rR-1HAP-1 

i=2  k=l  p=l 


N i-1  i-k 

IP  Ai~k~PBu^_,x^T(AT>iHTR~1HAP~1 


AT  i-1  i-k 

~ T 

P= 

N i i-1  i-k 


- ^ ^ ^ Ai-*-PBu^_1UjIIBT(AT)i“^HTR"1HAP-1  - 0 (J.15) 

there  A is  the  appropriate  value  of  A which  satisfies  (3.15). 
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0,  then  (3.15)  undergoes 


considerable  simplification: 


(Note  that  if  x„  is  zero  in  the  autonomous  case,  then  the  root  A of 


(3.16)  is  indeterminate.) 


3.4  THE  NECESSARY  CONDITION  - x_  UNKNOWN  PARAMETER 


Now  the  initial  condition  xQ  is  taken  to  be  an  unknown  parameter 


interpretation  of  the  ML  estimate  is  basically  the  same  as  the  x( 


known  case  with  the  exception  that  the  initial  point  is  free  to 


participate  in  the  optimization  in  the  present  case.  As  with  xQ  known, 


but  accounting  for  the  free  initial  condition,  the  distribution  of  the 
ith  measurement  for  the  scalar  model  is: 


yQ  ~ TJ(hx0,a z) 

Once  again,  since  the  measurements  are  independent  and  gaussian, 


the  maximum  likelihood  estimate  is  a least  squares  estimate,  i.e., 


minimize: 


The  necessary  conditions  on  a and  xG  become: 


I 


V ' £ *' - * S 5 5 

W/r 

Forming  -gj-  leads  to  the  same  condition  as  obtained  with  *a 


(3.20) 


known,  i.e., 


N i 


xD  ^ ^ £ (t  - 

- hxQ2  ^ ia2i_1  - hhxfl  - jla21’^1  Uj.j 

- hh2  ^ ^ Tj  Ci  - * 0 ^ 

(Note  that  the  noise  variance  does  not  appear  in  the  necessary 
conditions  in  this  case  nor  when  xc  is  known.) 

Introduction  of  (3.20)  into  (3.21)  gives  an  expression  for  the 
stationary  points  of  0 ^ for  the  parameter  a as  a function  of  the 
measurements : 

{ |j  jfj  r « - «•»*♦“»  *», 
-EEtt''-15*"-  u, 

5-o  “ i-T  i-i 

* «>  Jj  Jj  Jj  £ « - 

e 

♦ h2*2  ££££  (2p  - s - i,.2«^r;-s-t-lu^lUt_2 


(3.21) 


- h2*2 


N S r r 


(r  - fc)a2 (*+P+r) "•"t-i } 


i-o  p-u  r-J  s 


(3.22) 
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For  the  autonomous  case,  (3.20)  reduces  to: 


yiai;/(h 


a21) 


(3.23) 


and  (3.22),  assuming  x„  not  equal  to  sero,  reduces  to: 

°N 

N N 


(3.24) 


For  the  vector-matrix  case  only  the  autonomous  version  is  treated 
because  it  yields  considerably  simpler  equations  than  the  nonautoncmous 
one  yet  illustrates  all  the  basic  steps  required  to  derive  the  latter. 
The  density  of  the  output  measurements  yj  for  the  autonomous  model  is: 

yi  -v rj(HA1xQ,R ) i - 0,1,... ,N  .(3.25) 

and  since  the  { y^}  are  independent,  the  cost  function  for  the  maximum 
likelihood  estimate  becomes: 


0N(a'xo) 


(3.26) 


Taking  the  differentials  of  x 

N 


o 


0 at  a stationary  point  of  gives: 
[-HAi(Ax0)]TR~l[yi  - HAix0J  - 0 


(3.27) 


or, 


or. 


x0)(Ax0)r(Ar)iHr]  - 0 


tr  [ - BA 1 

(Ar)iHrR~lyi  - ^ Mr)iHT«-1Hkix0  - 


(3.28) 


Let 


= 


T)1HTU"lHki 


(3.29) 


Since  all  models  were  assumed  to  be  observable  and  R-1  is  symmetric- 
and  positive  definite,  $w-1  exists  for  Win.  Then  xQn  can  be  expressed 


as: 


(3.30) 


Taking  differentials  of  A at  a stationary  point  of  C^givest 

N 

b‘Al0H(A’*o)J  " 2 (-H(M  )x0)rRA[ Vd  ~ H*  xQ]  - 0 (3.31) 

tr  [ ^ R"1^  - HAiXQ)xor(A')i~P(Ui)r(AT)p-lir ] - 0 

^ ^ (Ar)P-lHTR-lyixor(AT)i~P 
- ^ (Ar)P'1  HrR~l  HAixQxor  (Ar)i~P  - 0 


or 


or 


(3.32) 


Introducing  (3.30)  into  (3.32)  with  xQ  ■ xQ  gives  the  necessary 


N 


condition  for  A as 
N N i 


(AT)P~l  HTR~l  yd  y^R~lHA^N~l  (Ay‘)i~P 
N N N i 

(A T)P~l  HTR_1  HA1^1  ( At)JhvR~ 1 yi  HA*^1  <Ar)i ' 


P 


- 0 


(3.33) 


3.5  THE  NECESSARY  CONDITION  - DISTRIBUTION  OF  XQ  KNOWN 

In  this  section  the  initial  condition  xQ  is  assumed  to  be  an 
unknown  randan  variable  whose  density  is  known.  Again  the  nature  of 

A 

the  estimator,  roughly  speaking,  is  to  seek  an  A which  results  in  some 
best  fit  of  model  output  to  measured  output.  While  this  case  is 
intexmediate  to  the  previous  two  cases  with  respect  to  the  amount  of 
information  on  xQ  assumed  available,  the  polynomial  form  of  the  likeli- 
hood equations  is  substantially  more  difficult  to  obtain. 
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For  the  scaler  case,  take  xQ  distributed  as: 


(3.34) 


and  xQ  independent  of  (n^K  Because  the  random  variables  xQ,  nQ » n j , 
•••,  r\N  are  independent  and  gaussian,  and  their  joint  distribution  is: 

P(n*)  * (2*)  exp(-|(n*  - x*JrFj~1  fr)*  - x*)]  (3.35) 

where: 

n*T  - (x0,r)0,r)lf...fr)ff) 
x*r  - (Xo,0,...,0) 

Rj  - diag(e2,o2,. . . ,c2) 

From  the  scalar  model  equations  (3.6) , the  output  measurements  are 


seen  to  be  related  to  the  noise  and  initial  conditions  as  follows: 

y * Dr\*  + hbu* 


(3.36) 


where : 


yT  ■ (y0/yj/*--/yw) 

D = [ha  I I] 

aT  - (l,a,a  ,.,.,a  ) 

u*T  = (0,uo,u1+auo, . . . ,Ujy_2+*  • 

I = identity  matrix 

By  applying  the  theorem  on  the  linear  transformation  of  jointly 
gaussian  random  variables  (e.g.,  see  Anderson  [1958,  p.26])  to  (3.35) 
with  transformation  (3.36) , the  measurement  density  becomes, 

p(yta)  - (2r)  -*V2|  ^ expZ-jdy  - h*0a  - hbu*||2  (3.37) 


where : 


R2  - CRjb’ 
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From  (3.35),  (3.36),  and  (3.37): 

*2  " ff2j  * /*2e2“T  (3. 38j 

where: 

R^  ■ (N  + 1)  x (N  + 1)  matrix. 

(Note  that  the  y^  are  now  not  independent.  Further,  (3.37)  indicates 
that  the  ML  estimate  will  not  be  a least  squares  estimate,  nor  even  a 
Gauss-Markov  estimate.) 

It  can  be  shown  (see  Appendix  A)  that 

|i?2l  - (oz)N(o2  + h2z2aya) 

= ~2 [I  - (h2z2/ (h2z2aya  + o2))aar] 

* O - • — 

The  cost  function  can  now  be  expressed  in  the  following  form: 

0N(a)  - o2(log|R2|  + ||y  - hxQa  - hbu*||2 

*2 

Equivalently,  let  the  likelihood  function  L be  defined  as: 

L S ply ;a) 

Since  ply /a)  is  smooth  and  greater  than  zero  for-  all  y and  a, 
logarithm  is  monotonic,  the  stationary  points  of  L satisfy: 

""“1*2 1"1  -r  1*2!  + (y  - - «>u •)rR  -1(bZ0  + 

da  2 ' da  * o-  2 0 da 

-l(y  - bSQa  - hbu*)y  (y  - h*  a - hbu*)  - 0 

2 da  °~ 

Multiplying  through  (3.43)  by  [-a1*  (a2  + h2z2aya)2  ] gives: 

[a2 (a2  + h2z2aya) h2z2ayaa  - ly  - hr0a  “ hbu*)y[(oz  + h2z2aya)2I 

- (a2  + h2t2flTflHh2e2aaT';  ] (bSc  a + hbu*) 

o— a ® 

+ ly  - tic0 4 - hbu*)r[lhl*z>*iria)  (mt)  - la2  + h2e2aTa)  fh2z2)  t&alr)l 

ty  - *»0«  - - 0 


(3.39) 

(3.400 

(3.41) 

(3.42) 
and  the 

hb  *}L*) 
da 

(3.43) 
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where  > 


Dividing  through  by  h2c2,  expanding,  and  regrouping  by  the  various 


combinations  of  inner  products  results  in  the  following  necessary 
condition  for  a: 

{[a2  (a2  + h2eVa;  + h2xQ 2 Vc2Jaraa 

+ (yTa)  [2(ara)  (toe  )o2  + (aru  *)hb(o2  + h2c2a'ra)  + (yra)h2e2  (aTa) 

- 2(aru*)hbh2ez  (aTa  ) + (a  ru*)hb(o2  + h2t2ara)] 

" " **fl  ~a  "*  *“ 

♦ (yTaa)  [ (&ru*)hb(o2  + h2e2aTa;  - (yr§) (oz  + h2e2aTa>  - hx0Vo2 

- o2i»0aTa7 

- (yTu  *)hbh2c2(ara  + V)2 
a ""  ™ ’ 

+ (aTu*) [(aTu*) (ara  )h2b2h2z2  - (aTu  *)hzb2(o2  + h2z2aya) 


- 2hx  hboz (ara  )] 
o - -a 

+ (ayua*)  [hx0hb('Vo2  + o2Sra)J 

+ (a*u*)  [hbhia2  (aTa  + V)  - (aru*)  h2b2  (o2  + h2z2ar§)] 
+ fu*Tu *)[h2b2h2z2 (aTa  + V)2]}  - 0 


Rewriting  the  above  using  summations  instead  of  inner  products 


allows  many  of  the  terms  to  combine  (though  the  resulting  expression 


appears  less  compact  and  less  efficient  computationally) 


{[a2 (a2  + h2:2*)  + a2h2c2( 


- ura.iii 


■l 


f 


N N N p 

* hbh?t2t  (2p  - 2i  ♦ J - i)a2^i>'r+^yjur.1] 

N N p 

+ hl»*f  ” r + i)a*P'r+i“l9iur.1J 

* &z2l  I)  J]  J]  ^ " k)a2l+J+k~lyjyk]  - o2  J*  £ ia2+J~1yj 

J»o  j»o  i»0  j-3 


♦ hSnhbc2[ 


N N 


(2p  - r - 2i)a2(P+i)-r-1ur_1) 


i-0  p~l  r-J 


H0hb'Va2(  dP  - r)a2P-t-'ur.1l 


- hbhzt2(V 


rr 

*S 


a2i)2f 


N p 


(P  - r;aP“r_1ur_Jup_i; 


p-J  r*. 


+ h2b2h2z2 [ 


N N p N s 


p-I  r-i  s« 


(i  - 2s  + t)a2 (P+l+s) ~r“t"1ur_jut_j7 


♦ W/ 


ft  - 2sja2^>+s^~r“t:“1ur_2iifc_2/ 


p=l  S' 


+ [ti*b2z2  (V  + T*  a2i)2(  T 


(p  - s)a2p~r~s~1ur-ius-1)} 


p-l  r~l  s 


(3.46) 


For  the  autonomous  case  (3.45)  reduces  to: 


{[cr2(o2  + h2z2ara)  + 'ia2h2x  2]ara  + 2o2hx  (ara  ) (y ra) 


[Va*hx0  + o2hx0(a'a)](yraa)  + (h*c*araa) (y*a) 


^ r ^ a ^ 


- (a2  + h2z2ara) (y Ta) (yTa  ) } - 0 


(3.47) 


and  (3.46)  reduces  to: 


k 


1 


lo2(o2  ♦ h?s02v  ia 2i’1  * o2h2e2 

+ 02h2*o  jfj  ^ <2i-j)a2i+3-'yj  - 


♦ />2e2 


fi  - k)a2i+J+k-1yjyk  - o2¥hJo  T*  i«i~1yi> 


- 0 


(3.48) 


The  situation  where  A is  an  unknown  n x n matrix  presents  some 
difficulties.  Arriving  at  a density  for  the  vector  measurements 
analogous  to  (3.37)  for  the  scalar  case  is  straightforward  enough. 
Unfortunately,  no  useful  expressions  for  the  determinant  and  the 
inverse  of  the  covariance  could  be  found.  Since  this  precludes 
reducing  the  likelihood  equation  to  the  desired  polynomial  form  at  this 
time,  the  matrix  case  will  not  be  developed  here. 


3.6  THE  NECESSARY  CONDITION  - DIFFERENCE  EQUATION  ERROR  APPROACH 

The  general  class  of  parameter  identification  problems  treated 
in  the  previous  three  sections  could  have  been  approached  somewhat 
differently.  Instead  of  seeking  the  parameter  values  which  gave  a best 
fit  of  the  model  output  to  the  measured  output,  parameter  values  could 
be  selected  to  minimize  the  error  that  results  when  system  input  and 
output  measurements  are  introduced  into  the  model  equation. 

To  implement  the  latter  approach,  referred  to  as  the  "differencing 
approach"  in  the  following  discussions,  the  model  Equations  (3.5) 

(or  (3.6)  in  the  scalar  case)  must  be  rearranged.  The  assunption  that 
H-1  exists  is  made  to  facilitate  this  (which,  of  course,  means  that 
now  m » n.)  Working  with  Equation  (3.5) , the  rearranged  model  is 
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■ BAx±  HBu±  + n i+i 

or  i?i+l  " cVi  * BBuj  + i - 0,1,...  ,N  (3.50) 

where : 

C - OUT1 

* ni+l  * CrU 

(In  the  scalar  case,  C becomes  a and  becomes  W>.) 

The  equivalent  system  model,  Equation  (3.50),  has  two  important 
differences  relative  to  the  original  formulation  of  the  model.  In  this 
new  system,  the  state  variables  in  the  difference  equation  are  known 
as  opposed  to  the  x ^ in  the  previous  system  model  which  were  unknown. 

In  addition,  the  noise  variable,  though  still  zero  mean,  is  now 
correlated  and  acts  as  part  of  the  input. 

Just  as  was  demonstrated  with  the  original  formulation  of  the 
model,  the  form  of  the  ML  estimator  based  on  the  equivalent  model 
strongly  depends  upon  the  assumptions  made  about  the  initial  conditions. 
In  fact,  for  each  of  the  three  initial  condition  situations  treated 
earlier,  the  equivalent  model  leads  to  the  same  likelihood  equations 
and  thus  the  same  estimators  as  found  previously  - a none  too  sur- 
prising result  if  the  models  are  one-to-one.  To  see  this,  look  at  the 
case  where  xQ  is  an  unknown  parameter.  Prom  Equations  (3.50)  and 
(3.5)  , where  with  no  loss  of  generality  take  = 0: 

y0  m Bx0  + n0 

- cy0  + co 


" cyj»-l  + Cn-1 


(3. SI) 


Using  Equation  (3.50) : 


C*  * 7)(0,R *) 


where: 


0/v0/<.2/  '^s-r 


and 


R * 


R -RC 
-CR  R+CRCr  - RC * 


N V 


X -RC 


-CR  R+CRC 


\ 

\ \ 

\ % 

\ X 


0 


o V'r 

- 4 

From  Equation  (3.51),  the  Jacobian  J is: 

J " |!{H  " 1 


I -C 


\ \ 


\ V 


\ 


Equation  (3.51)  can  be  rewritten  as: 

J 


0 ' 

N-C  ' I 


** o + ^o 

• 

c 

o 

• 

at 

e 

• 

% 

■ h-1  ■ 

(3.52) 


(3.53) 


(3.54) 


(3.55) 


(3.66) 


(3.57) 


Using  Equations  (3.57),  (3.56),  and  (3.54)  along  with  (3.51) 
leads  directly  to  the  likelihood  function  (3.26)  except  for  the  mean 


I 
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H 


of  the  joint  distribution  as  given  in  Equation  (3.25) . However, 
recognizing  that: 


X 

X 

-c  N\  0 • 

X X 

1 

1 

I 

X 

X 

CN  X 0 

X x 

X x 

0 'x  \ 

~C  1 

ss 

: \x 

> • . >c  '-I 

(3.58) 


C1  * (HAH-1)1  = HA1H~ 1 (3.59) 

the  mean  is  easily  established,  and  the  equivalence  is  shown. 

There  exists  another  interesting  assumption  on  the  initial 
conditions  for  the  alternate  model,  Equation  (3.50),  that  can  be  made. 

In  this  case,  the  initial  condition  is  considered  to  be  u and  is 
assumed  known,  as  it  obviously  is  since  it  is  a measurement,  and  a 
deterministic  constant.  While  this  assumption  can  be  applied  to  the 
model  of  Equation  (3.50) , it  is  inconsistent  with  the  underlying  model 
of  Equation  (3.5)  which  says  that  is  a random  variable.  It  would 
appear  that  this  discrepancy  has  the  effect  of  assigning  an  improper 
weight  to  the  first  error  - an  effect  which  could  be  expected  to  have 
diminishing  influence  as  the  number  of  samples  increases.  Treating  yQ 
as  a known  deterministic  constant  in  the  alternate  model  gives  good 
results  experimentally  for  systems  that  correspond  to  the  original 
model.  Investigation  of  this  formulation,  irrespective  of  whether  or 
not  it  directly  applies  to  the  original  problem,  is  of  interest.  (See 
Mann  and  Wald  (1943]  and  Levin  [1964]  and  the  related  discussion  in 
Chapter  2.) 


The  likelihood  function  for  the  alternate  model  taking  y0  as  a 
known  constant  is  found  as  follows.  The  basic  set  of  equations  for 
this  problem  is  generated  by  (3.50)  indexed  by  i » 0,1 , . . . ,N-1 . 

For  this  system,  the  joint  distribution  of  , . * • is  easily 

shown  to  be: 

C ^17 (0,RZ)  (3.60) 

where : 

I 


and 


*z  - 


R+CRCr  -RC* 


-CR  R+CRCr  -RCV 

* v S. 


\ 

-RCT 


-CR  R+CRC1 


Since  the  Jacobian  J ^ for  Equation  (3.50)  equals  one  where: 

3C 


and 


yT  m (y% rV2' * * * 


then: 


nN  , 

Lz  - (2v)  2 |i?z|"2expf-^ry*  - f)'\~lCy*  - f)] 


where : 


(3.61) 


(3.62) 

(3.63) 

(3.6A) 


- n-vector 


HBu_ 


HBu 


N-l 


The  characteristics  of 


/ W - l(y2  ~ Cy0),...,(yw  - Cy^)) 

(3.65) 

the  likelihood  function  Lz  are  considerably 
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different  from  those  of  the  previous  sections.  The  means  of  those 
densities  in  all  three  cases  were  expressed  in  terms  of  powers  of  the 
unknown  matrix  A.  In  Lx,  the  mean  is  not  a function  of  A,  but  A does 
enter  linearly  in  the  set  of  differences  of  the  y^'s.  On  the  other 
hand,  the  covariances  of  the  previous  densities  were  of  the  form  a2 1 
or  where 0 denotes  the  Kronecker  product,  while  in  L_  the 

covariance  has  tri-diagonal  form  with  elements  which  are  of  the  form 
A or  It  A2. 

Because  Rz  is  not  diagonal  (in  the  sense  that  the  earlier  co- 

variances  of  the  form  10  ft  were  diagonal),  crudely  speaking,  the 

estimate  for  A is  found  by  fitting  a hyperplane  to  the  measurements 

in  the  non-trivial  norm  of  R . This  characteristic  is  interesting 

z 

relative  to  the  solutions  of  the  previous  sections.  There  the 
residuals  were  weighted  equally.  That  this  can  be  undesirable  is 
easily  seen  by  considering  the  scalar  autonomous  model  with  |a|<I. 

Then  early  measurements  have  more  useful  information  than  later  ones, 
and  thus  the  later  ones  should  be  weighted  less  than  the  early  ones. 

The  likelihood  function  Lz  will  give  uneven  weighting  to  the  residuals. 
Whether  or  not  this  will  yield  any  better  estimates  than  those  of  the 
previous  sections  is  not  immediately  clear  since  the  nature  of  the 
residuals  is  different,  and  the  arrangement  of  the  weighting  is  not 
obvious . 

Because  of  the  complexity  of  this  approach,  first  the  scalar 
autonomous  case  will  be  developed.  The  covariance  matrix  Rz  reduces 


to  the  N x N matrix: 


0 


(?  m O2 


2+a2  -a 


'x 

\ V 

Nx  \ 

V \ V 

SN  V\  XS 

Sx  \ -« 

0 \ sx 


-a  2+a* 


(3.66) 


It  can  be  shown  (see  Appendix  B)  that: 


Ml  - £ a2i 


and 


where : 


<2“l  - 


(3,.  67) 


(5.68) 


i-2  N-J  N 

r^‘l " 1 fi  <,2*;aj“i<'  a2t;/r  IJ 


i2*>  C?>iJ 


Also,  in  Equation  (3.64),  for  the  scalar  autonomous  case: 
n » 1,  f m 0,  and  C -*■  a 
dlogtz 

Forming  — — - 0 gives  after  considerable  manipulation  (see 


(3.69) 


Appendix  C) 


where: 


VN  + 0N'  - 0 


VN  » -2cr 


N W . 

2 r<  r»  „xi(p+<i>-i 


S 

-jo2  i it  t * t il’2k'1 

k~l  JmO  j“N 

JtiAl  k>W 

N 

T(  (k  + 1 - <i)Sk+1_„_2iJ)  y]»* 


(3.70) 


(3.71) 


JtiAl  k>N 

2(2N-1)  Jl 

On'  - £ f¥ 


(3.72) 


where : 
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As  a result  of  differencing  the  measurements,  it  would  appear 
that  information  about  the  initial  condition  is  lost,  and  consequently, 
this  case  is  similar  to  one  where  xQ  was  an  unknown  parameter.  In 
fact,  Qn‘  above  is  equivalent  to  the  first  term  of  the  necessary 
condition  for  xQ  unknown  parameter  case  (Equation  3.22),  i.e., 

N N N 

- t)a2t+i+j~lyiyj  (3.731 

For  the  scalar  plant  with  a forcing  function,  a necessary 
condition  in  polynomial  form  can  also  be  found.  However,  introduction 
of  the  forcing  function  destroys  much  of  the  symmetry  characteristic  of 
the  autonomous  case  and  as  a result  precludes  the  extensive  reduction 
in  complexity  the  autonomous  expression  for  QN‘  can  undergo  (see 
Appendix  C) . Because  of  the  similarity  of  this  case  to  the  xQ  unknown 
parameter  case,  qn'  is  probably  equivalent  to  the  necessary  condition 
for  the  latter  case  (Equation  3.22),  but  this  equivalence  has  not  yet 

( 


been  shown. 


The  necessary  conditions  for  the  vector-matrix  case  are  somewhat 
more  difficult  to  establish.  First  take  the  log  of  Lgi 


log  Lg(A)  - - f log  2v  - § log  |rJ  - j (y*  - £)TR~l  fy*  - f) 


Taking  differentials  at  a stationary  point: 

A flog  LZ(A))  - tx[R~xt>.(Rz)l  - t(y*)  TRZ  (y*  - f) 


where : 


- \(y*  - f)'b(Rzx)(y*  - f)  - 0 

ti(y*)r  - - (yof...,yti.1)(i^  AT>  - - y0Tfi®AT) 
y0r  " " (Vo* '""Vh-i) 

ARAt+ARAt  -RAr 


Af/?z;  - 


\ 

-AR  N \ 

V \ V 

\ \ '' 

\ \ 

0 \ N 

-A R ARAT+ARAr 


AfRz"1;  - - RZ~X  (L(R  ))R 


(3.74) 


(3.75) 


But  A (Rz)  can  be  rewritten  as: 

t . 


A(V  ' 


AR 


\ 0 


AR 


k \ 0 

.\\ 

-I  V 


A -X 


\ 

0 


RA’ 


\ 0 
• \ 


RA» 


(3.76) 


or 

AfRz;  - uN® f)[(iN^>AT)  - sn;  + [(IN®AT)  - snr](iN® a**; 

(3.77) 

where: 
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A*  = AJ? 

Ijf  » if  x JI  identity  matrix 

Using  the  above  and  the  properties  of  the  trace.  Equation  (3.74)  can 


be  written  as: 


trUdN©AT  - sn;*z_1  - (i®R~l)y0(y*  - 
- (In<S)Av  - Sn)Rz-l(y*  - f)(y * - f)rRz~1] 
A';)  - 0 


Definition:  Letflf  be  an  (nN)  x (nN)  matrix  partitioned  into  N2  equal 
suhmatrices  of  size  n x n.  Using  the  same  scheme  as 
associated  with  a matrix  having  scalar  elements,  denote 
the  Ijth  submatrix  ofj^  by  G±j.  Define  the  generalized 


trace  operation: 


ro»  * L 


(3.79) 


Since  A'  is  an  arbitrary  matrix: 


TR{  [ (Iff  © AT  - Sn)  - (i®R'l)y0(y*  - f)r 

- (IN®Ar  - Sn)Rz~l(y*  - f)(y*  - f)  T]RZ~1  > = 0 

(3.80) 

The  usefulness  of  the  necessary  condition  might  be  enhanced  if  an 
explicit  form  for  R * could  be  found.  However,  without  some  restric- 
tive assunptions  on  the  structure  of  the  matrices,  finding  an  explicit 
inverse  appears  to  be  difficult.  For  example,  when  R *=  a2I  and  A is 
normal,  i.e.,  AAT  • ATA,  then  the  inverse  can  be  found.  However, 
restrictions  on  the  structure  of  the  A matrix  invalidate  the  likelihood 
equations  whose  derivations  are  based  on  completely  general  variations 


of  A at  the  stationary  points.  It  is  not  clear  that  rederivation  of 
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the  likelihood  equations  with  any  such  canonical  form  for  A would 
result  in  any  benefits. 

Notice  that  to  derive  the  equivalent  system  model  (Equation  (3.50) , 
H_1  had  to  exist.  In  the  more  general  case,  H is  taken  as  an  m x n 
matrix  with  m < n.  If  m < n,  the  most  obvious  approach  is  to  use  a 
pseudo- inverse  form  for  H which  is  suited  for  this  problem. 

Another  approach  that  appears  promising  is  use  of  the  observer 
of  Luenberger  [1964] . With  this  scheme,  the  measurement  equation  can 
be  augmented  so  that  it  cam  be  inverted  directly  for  plant  state  x ^ . 
There  are  a nunber  of  difficulties  associated  with  this  technique,  not 
the  least  of  which  is  the  necessity  for  the  solution  of  a Lyapunov 
equation  for  the  matrix  required  to  augment  the  h matrix. 


3.7  PLANT  NOISE 

Consideration  of  plant  noise  introduces  additional  complexities 
in  the  task  of  finding  polynomial  type  likelihood  functions.  Typically, 
the  effect  of  plant  noise  is  to  add  a term  to  the  covariance  matrix 
for  the  system  without  the  plant  noise.  Finding  the  inverse  of  the 
covariance  becomes  the  problem  of  finding  the  resolvent  of  a matrix 
much  as  already  occurred  in  a simpler  form  in  the  case  of  xQ  with 
known  distribution. 

The  scalar  versions  of  the  four  cases  covered  in  the  previous 
sections  will  be  considered.  The  basic  model  (3.6)  with  plant  noise 


becomes : 


*i+l  " “i  + *°i  + Ci 
y . - hx.  + n. 


(3.81) 


If 

I 

* 

4 

I 


The  following  assumptions  are  made  on  the  properties  of  the  noise: 

(Cj)  independent  and  (Cj)  ^J(0,&2) 

independent  and  independent  of  { C_£ ) / ^ Tj(0,a 2) 

For  convenience,  the  likelihood  equations  will  be  presented  in 
a pre-polynomial  form.  The  polynomials  may  be  found  merely  by 
expanding  the  equations.  (The  derivations  of  the  likelihood  equations 
are  given  in  Appendix  D) . 

a.  Known  initial  condition  xQ 
The  likelihood  equation  is: 

o • l*yrir5£l*y|;  ♦ Uv„  - Koto  - 

- 2lhx0(£*ti)  + hb(^)uN]-'Ry~l}(yfl  - hxQaJi  - hbiu^)  (3  82) 
where: 


y^’  • 

V - (a,a2,. . . ,aN) 
UNr  m (u0,...,uN_1) 


Jp  - (o2)*  ff  (£$1  + 1 + a2  - 2a 
P k-1  a2 


and  J = 1 
o 

Lt  “ (o2  + h2e2>Jt-i  - a2a4Jt;_2 
and 


(3.83) 

(3.84) 

(3. 85) 


(3.86) 


cos.—U  ,pil  (3.87) 
P+1 


,t>J 


(3.88) 
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i 1 


! 


Lo  - 1 

I>2  » a2  + h262 

L2  - o4  + o2^2*2  (2  + a2;  + B**h4 


R~l  * 

'if1  - (<’2a)j~i(Li-1)(JN_j)/(I'N)  ,J*I 


(3.89) 

(3.90) 

(3.91) 

(3.92) 

(3.93) 


b.  Initial  condition  xQ  an  unknown  parameter 

This  case  is  identical  to  the  one  in  part  (a)  above  but  with 

A 

the  addition  of  an  equation  for  xc^.  Referring  to  the 
definitions  above  and  Equation  (3.19): 


(yN  - hx0an  -hb*uN)  TRy~l(aN)cz  + (yQ  - hxQ)  - 0 


(3.94) 


m [(yJi  ‘ AWVTVl5W°2  " yo1/[(1  + a2^TRy‘lVh; 


(3.95) 


c.  Known  distribution  of  initial  condition  x 


Assume: 


xQ  independent  of  (5^)  and  (n^h  xQ  ^7}(X0,cz) 

The  form  of  the  solution  is  similar  to  that  where  x is  known. 

o 

The  likelihood  equation  is: 


0 - * <rj„  - 


r“ 


■ 2lt*o  (7g»}  + ■ ^ofw  - 


where : 


V * (yo'yi'y2 V 

V • a'a'a2 a*> 


*o  ' 


ft  (B2  + o2  Cl  + a2;  - 2ao2  — 
*-l 


cos 


and  JQ  m 1 


(Note  that  the  number  of  samples  is  N + 1 . ) 
Lt  » [( a 2 + J^e2)  (B2  + a2)  + a2hzt2a2]Jt_2 
- a2cAit_j  ,ti4 


and 

~Lo  ” 1 

Lj  - o2  + h2e2 

i2  - (a2  + h2e2;  CB2  + cr2)  + h2e2a2o2 
L3  - Co2  + h2c2HCB2  + o2 ) 2 + o2B 2a2; 
+ o2h2e2a2CB2  + o2  (1  + a2)] 

l*yl  * 

fly”1  * (*iT) 


(3.96) 

(3.97) 

(3.98) 

(3.99) 

(3.100) 


(3.101) 


(3; 102) 
(3.103) 


(3.104) 


(3.105) 
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*ij~*  " (h-l)  (Xs+l-j)/(hn-l)  'J>-i 

| (h2e2a2  + 82  + - a2a'*Jlt v 


2<j<N+l 


The  likelihood  equation  is: 
0 " + * 


3.8  MINIMAL  SUFFICIENT  STATISTICS 


When  the  likelihood  equations  are  derived,  the  question  of  what 
might  be  their  simplest  form  inevitably  arises.  Concern  for  simplicity 


is  heightened  as  the  nvmber  of  samples  increases  because  computational 
effort  can  rapidly  increase  with  more  samples. 


The  problem  can  be  viewed  from  two  aspects.  One  is  purely 
algebraic  manipulation  to  reduce  complexity.  Any  approach  in  this  area 


is  basically  ad  hoc.  The  second,  which  in  a sense  is  a special  case 


■ | mu  . I 


of  the  first,  is  concerned  with  condensing  the  information  in  the  set 
of  samples  into  a smaller  set  which  contains  an  equivalent  amount  of 
information  about  the  unknown  parameter.  This  second  area,  which  is 
based  on  the  theory  of  sufficient  statistics,  has  formal  structure  and 
is  the  more  important  of  the  two  because  through  sufficient  statistics, 
the  amount  of  computation  can  be  stabilized  as  the  number  of  samples 
incraases.  Establishing  the  existence  of  (non-trivial)  sufficient 
statistics  for  the  four  cases  investigated  in  this  chapter  is  clearly 
of  interest. 

For  scalar  variables,  Dynkin  [1951] , on  whom  most  of  the  following 
discussion  will  be  based,  defines  a sufficient  statistic  in  the 
following  way.  Let  {p(x, 0)  :0  €@)  be  a family  of  probability  densities 
denoted  by  T,  defined  on  the  set  D in  the  m-dimensional  space  FT . 

The  function  x(x) , defined  in  D and  with  values  in  some  set  T,  is 
called  a sufficient  statistic  in  the  domain  D for  the  family  r,  if 
the  probability  densities  p(x, 0)  may  be  factored  into  the  form: 

p(x,Q)  - v[x(x)  ,*]w(x)  (x  € D,  9€®; 

(3.114) 

(For  a more  rigorous  definition  see  Rao  [1965,  p.110].) 

Then,  for  example,  if  the  samples  are  independent  and  iden- 

tically distributed,  the  existence  of  a sufficient  statistic  X would 
allow  the  following  factorization*: 

N 

TT  p(Xj ,01  - v[x(xlt...,xN) ,Q]w(xlt...,xN) 

i-1  (3.115) 

(Of  course,  from  a computational  point  of  view  what  is  desired  is  that 


* The  statistic  x - (x1,...,xl))  is  sufficient  and  is  known  as  the 
trivial  statistic. 
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X have  a recursive  form  such  as: 

X frj  / • • • /Xu  f m F[X  (Xie  . • • ,Xu_  1)  ,XN] 

1 " " (3.116) 

where  F is  some  reasonable  function.  Otherwise  nothing  is  likely  to 

be  gained.) 

If  the  sufficient  statistic  for  a family  r is  not  unique,  then  the 
question  of  which  one  is  most  desirable  arises.  Since  sufficient 
statistics  in  a sense  partition  the  sample  space,  a possible  charac- 
terization of  the  most  desirable  one  is  that  it  impose  the  coarsest 
partition  on  the  sample  space.  Pursuing  the  approach  more  formally, 
Dynkin  says  let  Xj(x)  and  be  defined  in  D.  Then  Xj  is  dependent 

on  X2  it:  follows  that  x^I*')  " X2<x")  implies  Xj(x')  m Xj  (x”)  . 

This  gives  a partial  ordering  among  the  sufficient  statistics.  The 


statistic  x(x)  is  called  a 


statistic  for  the  family  r in  the 


domain  D if  it  is  dependent  on  every  sufficient  statistic.  A statistic 
which  is  both  necessary  and  sufficient  is  minimal  sufficient. 

In  order  to  test  for  minimal  sufficient  statistics  two  theorems 
of  Dynkin,  as  corrected  by  Brown  [1964] , for  scalar,  independent 
identically  distributed  samples  are  useful.  The  first  theorem  (Theorem 
2)  considers  the  linear  space  L{T,D)) generated  by  constants  and  the 


functions  g (0)  for  any  6 € © where 


gx( B)  ■ log  p(x,B)  - log  p(x,6o; 


(3.117) 


and  8o  some  fixed  element  in©.  If  the  functions  1,  (x)  , . . . ,$r (x) 

are  a basis  in  L(T ,D) , then  for  N>r  the  system  of  functions: 

Xi(x1,x2,...,xN)  - ^(Xj)  + ...  + *±(xH)  i - l,...,r  llg} 

is  shown  to  be  a minimal  sufficient  statistic  for  the  sample  of  size  N . 

In  the  second  theorem  (Theorem  3a)  Dynkin  shows  that  if  the  probability 
density  of  the  sample  has  the  form 
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(3.119) 

then  the  here  correspond  to  those  of  the  previous  theorem  (Theorem  2) 
and  thus  form  a minimal  sufficient  statistic  when  sunned  as  in  (3.118). 

However,  Dynkin's  results  do  not  apply  directly  to  the  four 
initial  condition  situations  in  the  previous  sections  because  identi- 
cally distributed  samples  are  assumed  for  the  above  two  theorems.  The 
case  of  independent  samples  which  are  not  necessarily  identically 
distributed  is  treated  by  Zhuravlev  [1963] . A theorem  based  on  the 
above  theorems  of  Dynkin  is  presented  which  results  in  the  desired 
generalization.  In  this  case  each  p^(x,Q)  of  the  form  given  in  (3.119) 
has  associated  with  it  the  sets  of  functions  and  (^fx)}^. 

where  1 > j > k > N , the  number  of  samples.  The  minimal  sufficient 
statistics  are  found  by  forming  linear  combinations  of  various  sets 
j Sifter  examining  the  amount  of  dependency  in  spaces  generated  by 
all  possible  combinations  of  the  sets  j.  Non-trivial  minimal 
sufficient  statistics  result  only  if  the  dimension  of  the  space  gener- 
ated by  is  less  than  k. 

Applying  Zhuravlev  to  the  known  xQ  and  unknown  parameter  cases 
shows  no  non-trivial  sufficient  statistics  exist.  In  the  known  case 
the  ith  sample  is  distributed  as: 

(3.120) 

The  {C*}  for  the  ith  sample  are  a-*,*-*-1 , . . . ,a,l.  Clearly,  for 

i » 1,...,  N the  dimension  of  (C.)  x...x(Cfc}  is  not  less  than  N . The 

k i k N 


pi(yi,a)  - (2xo*)  * exp  - « 


- 


* 

£ 

i-i 


same  conclusion  holds  when  xQ  is  an  unknown  parameter.  For  the  other 
two  cases,  none  of  the  above  theorems  apply  because  the  samples  are 
neither  identically  distributed  nor  independent.  By  factoring  the 
covariance  matrix  and  transforming  the  original  samples  with  the  fac- 
tors, a transformed  set  of  samples  which  are  independent  can  be  found. 
Nothing  is  gained  by  this  approach  because  the  transformed  samples  are 
unknown  quantities  since  they  depend  on  the  parameter  a.  The  existence 
of  non-trivial  sufficient  statistics  for  these  cases  is  unlikely 
because  of  the  sample  dependence.  (The  literature  offers  little  for 
investigation  of  the  vector  sample  versions  of  the  four  cases.  Seme 
related  work  was  done  by  Bamdorff-Nielsen  and  Pedersen  [1968]  .) 

The  conclusion  from  all  of  this  is  that  the  degree  of  the  polynomials 
which  define  the  necessary  conditions  for  the  ML  estimate  will  in- 
crease without  bound  as  the  number  of  samples  increases  without  bound. 


SECTION  IV 


PROPERTIES  OF  THE  IDENTIFIERS  AND  THEIR  APPROXIMATIONS 

4.1  INTRODUCTION 

The  properties  of  the  parameter  estimate  a and,  in  particular, 
the  characteristics  of  the  roots  of  the  likelihood  equations  for  each 

I i 

of  the  four  cases  treated  in  Chapter  3 need  to  be  considered  in  order 
that  some  evaluation  of  the  practicality  of  these  estimates  can  be  made. 
Both  the  degree  to  which  the  ML  estimate  a can  be  expected  to  approxi- 
mate the  true  value  of  a and  the  extent  of  the  effort  required  to 
determine  a are  of  interest.  (Discussion  of  the  latter  is  primarily 
the  topic  of  the  next  chapter.) 

The  investigation  of  a is  developed  in  two  parts  - finite  sample 
properties  and  large  sample  or  asymptotic  properties.  To  this  end,  a 
number  of  questions  could  be  posed  such  as  the  bias,  consistency, 
efficiency,  asymptotic  distribution,  and  uniqueness  of  a as  well  as  the 
number  of  real  roots,  if  any,  of  the  likelihood  equation  and  their 
sensitivity  to  the  measurements.  While  furnishing  answers  to  these 
questions,  as  well  as  related  ones,  might  be  desirable,  in  general  this 
tends  to  be  difficult  to  accomplish.  Some  of  these  questions  plus 
possible  approximations  to  the  ML  estimate  are  explored  below,  but  for 
simplicity,  generally  only  the  scalar  versions  of  the  four  cases  in 
Chapter  3 are  investigated. 

4.2  FINITE  SAMPLE  CHARACTERISTICS 


When  the  number  of  samples  is  finite,  purely  deterministic  analy- 
sis of  the  likelihood  equations  is  of  only  minimal  value.  Short  of 


? 


forming  confidence  limits  for  the  properties  of  interest,  limiting 
forms  and  averaging  appear  most  advantageous  for  finite  sample  analysis. 

For  this  analysis,  let  a0  and  be  the  true  values  of  a and  xq, 
the  parameters  to  be  identified,  a and  x0  be  their  ML  estimates,  as 
previously,  and  a and  xQ  be  points  in  some  subset  of  the  real  line. 

Then  a measurement  y±  can  be  expressed  as: 

I 

i 

yi  " b^o^oo  + **  £ + ni  (i  m 2'2 *>  C4-D 

except  for  the  differencing  model  in  which  case  y±  becomes: 

Vi  m ao2Wo  + 2^  (hbuj-l  + (1  - 1,2, (4.2) 

4.2.1  LIMITING  ESTIMATE  FOR  ZERO  NOISE 

One  of  the  simplest  questions  to  answer  about  the  likelihood 
equation  is  what  happens  to  a and  %0  as  o2  or,  equivalently,  as  the 
noise  measurement  goes  to  zero.  When  the  initial  condition  xoa  is 
known,  introducing  (4.1)  with  « 0 into  (3.9)  with  the  above 
notational  changes  gives : 


JL  i i 


* * £ £ £ “ ' 
* “2  S £ fk  11 ' 


(4.3) 


Clearly,  DN(aQ)  • 0,  but  a0  may  not  be  the  only  root.  However,  if 
n i - 0,  then  from  Equation  (3.8)  0N(ao)  - 0 and  a - aQ  (uniquely,  unless 
xoo  " 0 and  Uj  = 0),  (Of  course,  to  show  the  above,  the  (fata)  equation 
could  have  been  appealed  to  directly,  but  then  the  details  of  what 
happens  in  Df/(a)  as  o2  ■+  0 would  not  have  been  illustrated.) 

When  x0  is  an  unknown  parameter,  the  limiting  condition  can  be 
found  by  introducing  (4.1)  with  r\i  = 0 into  (3.22).  Equivalently, 


working  with  with  (3.20): 


N i 


(h  a01a1x00  + hb  ^ a0i~Ja1uJ,2 
* 

N i N 

- a2i~^Uj_1) /(h  a2i) 


(4.4) 


Setting  a equal  to  an  gives  £ = x_. 

° o oo 

From  the  discussion  on  the  xQ  known  case  and  the  above  result, 
aQ  is  seen  to  be  a root  of  the  likelihood  equation  for  the  xQ  unknown 
case.  Thus  a - aQ. 

In  the  case  where  xQ  is  an  unknown  random  variable  with  known 
gaussian  distribution,  introduce  (4.1)  into  (3.46)  and  let  r\2  go  to 
zero.  (To  be  consistent  in  taking  the  limit,  a2  must  also  go  to  zero.) 


This  gives: 


Du(a)  - h^c2x  2 


N N N 

S 5 S 


(i  - k)a2i+)+k~xaJ+k 
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(p-s)  a2  ur.iu s_2; 


(4.5) 


By  symmetry,  DN(a0)  - 0. 

Note  that  the  covariance  inverse  (3.40)  is  positive  definite  for 
o2  > 0 and  o2aTJ?z"1a  -*■  0 as  a2  -*•  0.  Consider  the  cost  function  Ojf(a)  , 
Equation  (3.41).  Then,  if  o2  ■ ni  m 0,  0N(a)  is  a minimum  when  a ■ aQ . 

« a 

Thus,  provided  a1  * 0,  once  again,  as  *►  0,  a aQ. 

The  same  conclusion  holds  for  the  differencing  approach.  As  in 
the  previous  case,  o2  must  be  set  to  sero.  Once  this  is  done,  the 
equations  are  identical  to  those  in  the  xQ  unknown  parameter  case  for 
which  the  limit  has  been  shown. 
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4.2.2  ESTIMATES  FROM  AVERAGED  LIKELIHOOD  EQUATIONS 

Another  finite  sample  property  is  the  parameter  value  which  on  the 
average  is  a root  of  the  likelihood  equation.  In  other  words,  if  L is 
the  likelihood  function  and 

r(a)  - log  L(a)  (4.6) 

then  which  a,  if  any,  results  in  the  expectation  E(r)  - 0.  (This 
property  is  obviously  closely  related  to  the  one  for  zero  noise.) 

Let  y € Y be  the  N samples  from  one  of  the  four  models  previously 
considered,  where  the  samples  could  be  vector  quantities  and  the  system 
either  autonomous  or  not.  The  parameter  vector  9 » (0 j,...,0^)  is 
taken  as  appropriate  for  the  model,  e.g.,  (a)  or  (a,xo/)  or  the  elements 

A 

of  the  A matrix,  etc.  Let  0Q  and  0 be  the  true  value  stimate  of 

0,  respectively,  where  0 , 0 € ©.  The  likelihood  function  l is 

o 

defined  as  previously,  L - p( y,0) , and  let 

ri(Q)  - log  L (4.7) 

Assume  log  L and  r ^ , i ■»  l,...,p  are  continuous  on  F®®and  that|log  l| 
and  |rj|  are  bounded Vy  € Y,  0 € © by  functions  on  Y which  are  integra- 
ble  over  Y.  Then  the  following  theorem  can  be  stated: 

Theorem  4.1:  For  N samples,  on  the  average,  the  true  parameter  value 
is  the  maximum  likelihood  estimate  of  the  parameter. 

Proof: 


t " a 

B[ri(6)]  -I  l log  p(y  ,6)  ] p(y,eQ)dy 
^ —00 

- 0 a 


(4.8) 
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If  0 - 0O  , 


BlrifQo)]  - [f 


(4  ..9) 


By  the  above  assunptions , 


eIti^q))  - f-JL.  f p( y,6)  <Jy7  | - 0 

30, • 0o 


(4.10) 


This  conclusion  unfortunately  does  not  directly  answer  the  question 
of  bias  of  0.  Showing  that  on  the  average  the- true  value  of  0 is  a 
root  of  the  likelihood  equation  does  not  necessarily  mean  that  the 
average  of  the  root  0 is  0Q. 

4.2.3  THE  STABLE  ROOTS  OF  THE  LIKELIHOOD  EQUATIONS 

The  question  of  the  number  of  -eros  of  a real-valued  polynomial  in 
an  interval  of  the  real  line  is  at  best  difficult  to  answer  in  general 
but  when,  in  addition,  the  coefficients  of  the  polynomial  are  random 
variables,  as  is  the  case  with  the  likelihood  equations,  general  state- 
ments with  much  practical  value  are  rare.  Kac  [1943,  1959]  investigated 
the  average  number  of  real  roots  of  a real  nth  degree  polynomial  whose 
coefficients  are  independent  identically  distributed  normal  random 
variables.  The  results  are  in  the  form  of  complicated  integrals.  The 
conclusions  indicate  that  the  density  of  the  root  distribution  peaks 
at  ±1,  and  the  average  number  of  roots  within  the  interval  (-1,1)  is 
the  same  as  the  average  number  outside. 

As  might  well  be  expected,  more  may  be  said  about  the  nature  of 
the  roots  of  the  likelihood  equations  if  these  equations  are  investi- 
gated directly  rather  than  through  the  general  case  considered  by  Kac. 
For  simplicity,  the  discussion  will  be  limited  to  autonomous  models. 
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The  analysis  of  each  of  the  four  cases  has  the  same  pattern.  First 


the  nimber  of  roots  of  the  underlying  deterministic  portion  of  those 


terms  in  the  likelihood  equation  with  random  coefficients  is  investi 


gated  in  the  open  interval  (-1,1).  Then  assessments  of  the  effects 


of  the  remaining  purely  deterministic  terms,  if  any,  and  the  purely 


When  the  initial  condition  x ^ is  known,  the  likelihood  equation 


for  the  autonomous  plant  is  given  by  (3.10)  and  (3.6) 


The  plant  is  assimed  to  be  stable,  i.e.,  -1  < a < 1 


istic  portion  of  (4.11)  (the  bracketed  quantity)  will  be  treated  first 


In  the  interest  of  clarity,  some  lemmas  are  presented  prior  to  the  main 


Lemma  4.1:  Let  N be  a positive  integer,  and  let  k be  a real  nimber 


where  0 < k < 1.  Then 


When  N * 1 


2k/(l+k)  < 1 


Find  max 


Clearly,  kQ  ia  where  the  maximum  in  (0,1)  occurs.  Then 


k < 1.  Then 


(N+l)k(l 


(N+l)k(l 


Establishing  the  sign  of  the  deterministic  term  of  (4.11)  when 


0 < a0  < 1 and  -a  < a < 0 is  complicated  by  the  fact  that  the  sign  of 


the  ith  term  of  the  simulations  is  either  positive  or  negative  depending 


on  whether  or  not  i is  even  or  odd.  The  next  lemma  deals  with  this 


situation 


are  such  that  0 < aQ  < 1 and 


then 


Make  the  following  definitions 


Inductive  proof  for  N even: 


At  N » 2, 


A,  - B0  « a + 2aa  2 ~ « - 2a3 
if  n o o 


> -2a  + 2a (aQ2  -a2) 


(4y20) 


Assume  *N-1  > *N  _2 , N-l  even.  Then, 

*N+1  ‘ Bn+i  - An_2  - Bn_j  + NaN~*aQN  + (N  + l)aNaQN^ 

- Na2N~*  - (N  + l)a2N+l 
> Wa^a/  + (N  + i;a%w+1  - Na2"-' 


(N  + l)a2N+l 


Let  a » -Aa0,  0 < k < 1 


(4.21) 


(4'.  22) 


Then, 


A^+i  - > WcN"1a02W‘1  - (W  + l)l^a02N+1  + 

+ fJV  + l)k2N+1a(2N+l 
- k^^ag2""1  [N  - (N  + l)kaQ2  + N** 

+ (N  + l)kN+2aQ2] 

« Jtw-1a02W"1  fWM  + **0  - (N  + l)a02k(l  - kN+1)) 
> 0 (by  Lenina  4.2)  (J4.23) 


Now,  odd  N: 


At  N * 1 , A2  - B2  * aQ  - a > 0 


(4..  24) 


Let  N be  odd,  and  using  the  above  conclusions  for  when  N was  even, 

*»  - »» - Vj  - \-j  - »*"' V - »aa"'1 


> - »a2”-‘ 

- NA*f_la02W“l  (1  + k**)  > 0 

With  the  above  lemmas,  the  main  theorem  follows  easily. 


(4.25)| 


Theorem  4.2:  Assume  the  initial  condition  is  known  and  the  plant 
is  scalar,  stable,  and  autonomous.  Then  in  the  limit  as  the  measurement 
noise  n j goes  to  zero,  the  likelihood  equation  for  the  unknown  parameter 
aQ  has  only  one  stable  root.  Furthermore,  that  root  is  a0. 

Proof : 

If  h or  Xqq  (in  Equation  (4.11))  is  zero,  no  conclusion  on 
root  distribution  can  be  made.  Assume  h and  Xqq  are  not  zero. 
Consider  the  sign  of  A„  - B„,  N a positive  integer,  for  a £ (-1,1) 

N N 


where 


aN  " £ iai~laoi  and  Bn  " ZJ  ia2i_I 


If  a0  is  zero,  the  conclusion  is  immediate. 
Take  aQ  positive,  i.e.,  ac  £ (0,1).  Then, 

1.  aQ  < a < 1 

Compare  the  jth  terms  of  AN  and  BN,  1 < j < N 
Since  jal~laQj  < ja2!"1 , AN  < B^. 

2 . aQ  m a , then  Ajf  * fijy  • 

3.  0 < a < aQ 

Since  ja^~laQi  > ja2l~x  , AN  > BN. 

4 . a ■ 0 

Since  aQ  > 0 , AN  > BN 

5.  -aQ  < a < 0 

AN  > Bn  by  Lemma  4.3 

6.  -1  < a s ~aQ 

ja]~^aQi  > ja2J~* , j odd 
jaJ~la0J  j*  ja23~l,  j even 


I 
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Since  summations  on  j in  an  and  Bff  go  from  1 to  N,  AN  > BN  . 

The  above  follow  similarly  when  a©  € (-1,0).  | 

The  effect  on  the  above  conclusions  due  to  the  polynomial  noise 
N 

term  in  (4.11),  ][  ia2"1)!.,  is  not  very  clear  as  indicated  by  the 

i-2 

earlier  discussion  of  Kac's  work.  Perhaps  the  most  important  character- 
istic of  the  term  which  can  easily  be  determined  is  its  expectation. 
Since  the  ni  are  zero  mean  and  independent,  the  expectation  is  zero, 
and  thus  on  the  average  the  conclusions  in  the  above  theorem  hold  for 
the  likelihood  equation.  However,  for  a given  realization,  the  noise 
term  equals  r)j  when  a - 0.  Except  for  that  possible  biap  in  the  neigh- 
borhood of  a • 0,  simulation  results  indicate  that  the  polynomial  should 
be  smooth  on  (-1,1)  for  finite  N.  While  the  number  of  roots  of  the 
noise  polynomial  is  N-l,  one  would  not  expect  all  of  them  to  be  real, 
nor  all  the  real  ones  to  be  in  (-1,1).  For  small  variance  relative  to 
*oo  and  aQ,  the  noise  term  probably  only  has  the  effect  of  biasing  the 
{deterministic)  root  of  the  likelihood  equation  in  (-1,1). 

When  the  initial  condition  x ^ is  an  unknown  parameter  xQ,  the 
likelihood  equation  for  the  scalar  autonomous  plant  is  given  by  (3.6) 
and  (3.24) : 


I).***]-',] 


• '“oof  E Jl 

N N 

+ ( j - i;a2i+^-1ni  - 0 


(4 , 26) 


The  number  of  roots  in  the  open  interval  (-1,1)  for  the  deterministic 
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portion  of  (4.26)  (th«  bracketed  quantity)  is  investigated  below. 
Again,  preliminary  to  the  main  theorem,  seme  lemmas  are  presented. 


Lemma  4.4:  Let  z be  a real  number  where  0 < z < 1 and  let  n be  an 


integer  where  n > 2.  Then, 


z -nz+n-l>0 


(Beckenbach  and  Bellman  [1961])  (4.27) 


Proof : 


zn  ~ nz  + n - 1 = (z  - 1)  + zn~2  +...+  z - n + 1)  (4.28) 

Since  zn~1  +...+  z < n - 1 (4.29) 


zn  - nz  + n - 1 > 0 


(4.29) 

(4.30) | 


Lemma  4.5:  Let  aQ  and  k be  real  niznbers  where  aQ,  k €.  (0,1),  and 
let  M be  an  integer  where  M > 2,  Then, 

l-*3a04  M-l  l+kM~l  M-2  l-k*~2 

ka^  T 1-k M + k ~M~  l-k*  > 1 


(4.31) 


Proof: 


l-k3a04  M-l  1+k M~l  M-2  1-k*-2 

> 1-k2  M-l  + k M-2_  1-k M~2 


(4.32) 


k M 1-k? 


M 1-k 


Placing  the  right  side  of  (4.32)  over  a common  denominator 
and  subtracting  the  denominator  from  the  combined  numerators  gives: 
(1-k2)  (M-l)  (l+kM~l)  + k2(M-2)  (l-k1*-2)  - kMd-k?)  (4.53) 

= (M-l)  - Mk  + (M-2)  k2  - (M-l)k3  + (N-l))?~l  - (M-2)  l? 

+ Mk?+l  - (M-l) k?+2 

« [ (M-l)  - Mk  + Jt*-1;  + ( (M-2)k2  - (M-l)k2  + k?+1] 

+ (M-2 ik*-1  (1-k)  + (M-l))?*1  (1-k)  i4) 

> (A*-1  - (M-l)k  + (M-2))  + 1 - k + Jt2(kM“1  - (M-l)k  + M - 2) 

(4.35) 


> 0 (by  Leona  4.4) 
Also,  kM(l  - &)  > 0 


(4.36)  | 


Lemma  4.6:  Let  a and  aQ  be  real  nixnbers  where  -1  < -aQ  $ a < 0. 

Let  N and  j be  even  integers  where  N * 2 and  0 $ j $ N-l. 

Then: 

d - aQ3+1aN*2^ [a(N-j-l) (aQN~3~l -aN ~ 1 ) + (N-j-2)  (a0N~l~z-aN~J~2 ) ] 

+ a0Ja»+2J’2[a(N-j)(a0N’l-aN-3)  + (N-j-1) (a0N'j-1 -aN~J’' ) 1 

> o (4,37) 

Proof: 

Let  a » -kaQ,  0 < k < 1.  Then, 

d - ^+2j-2a(N-l  [-kaQ2(N-j)  (1-kP-J)  + a-*3a04)  (N-j-1)  (1+k*1']-1) 

+ *2a02 (N-j-2)  (l-lf-J-2)]  C4.38) 

> 0 when  * » J,  i.e.,  a *•  -ac 

Por  -a  < a < 0 , 


aN+23~2a0N+1k(N-j)  (l-kP-3)  [iZ*l2aL  N-j~1  1+kN J 1 

*ao2  N-j  l-kN~J 

-kN-l- 

N-j  1^3 


. , N-j-2  l-k*-J-2 
+ k — ■! 1] 


> 0 (by  Lenuna  4.5)  | 

Referring  to  (4.26),  make  the  following  definitions  for  the  subsequent 


discussions : 


N N 


v = ES 

i=0  j=0 

(4.39) 

N N 

■-Sjc 

(4 -.40) 

N 

N-l 

dV  - V - AN-l'  m £ i*W+i~X*oi 

+ JZ  Na2i+N-1*0N 

i-0 

(4.41). 
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2N+1-1,  i 


ia2i+W-l  N (4.42) 


Then  Jth  term  of  dA 


ja2N+j-la  1 + Na23+N~*a  N - Na2N+^la  1 - ja21+N-'a  N 
-/  o o o o 

- (N-j)a JaN+23~*  (a  N~3-aN~3) 
o o 


Establishing  the  sign  of  the  deterministic  part  of  (4.26)  when 


0 < an  < 1 and  -a„  < a < 0 is  even  more  difficult  than  was  the  case 


when  x was  known.  Now  the  signs  of  the  terms  depend  on  both  N and  a 


Note  that  for  N even 


However,  when  N is  even  the  Jth  term  of  (dAN ' -dBN ' } + (dAN~j  ' ' ) 


is  positive  when  j is  even,  but  if  j is  odd,  it  can  be  negative 


The  next  lemma  deals  with  this  prob’em  through  showing  that  for 


are  such  that  0 < a„  < 1 and  -a„  < a < 0,  then 


(aQ-a)  ll+2a(a0+a)  + a0aJ7 


0 < k £ 1.  Then 


(4.47) 


> 1 - 3a0k  + J«02*2  - «03*3 

- (J-a0«3 


> 0 

Thus,  A2 ' - *2*  > 0 
Assume  -2*  > bN-2'  > H and  even.  From  (4.41)  and 


(4.48) 

(4.49) 


(4.42), 

An'  m AN-a'  + dAl V*  * dAN-l' 

Bn‘  “ aAf-2'  + dfliv'  + dBN-l ' 

Consider  the  sum  of  the  jth  and  j+lat  terms  of  (dAN’  -dBN‘ ) + 

(dAll-l' ~dBN-l' ) when  j is  even  and  0 < j < N-2.  From  (4.44),  this 
sum  becomes)! 

* (N-j-aHaJ'-j-Z-a"-]-2)] 

+ aQJaN+2J-2[a(N-j) (a0N~J-aN~l)  + (N-j-1) (a0N~J~l -aN~J~1)]  (4.50) 

> 0 (by  Lemma  4.6) 

Also,  when  j ■ N,  dAN'  - dBN'  ■ 0. 

Therefore,  for  N even  and  * 4, 

dAf/'  + dAfj-i'  > dBif  * + dBtf-i ' 
or,  for  N even  and  i 2, 

AW'  > BW' 

Now,  take  N odd. 


(4. 51) 

(4.52) 


At  N - 1, 

Al'  ~ Bj'  * ac  - a > 0 (4.53) 

For  odd  N > 1,  the  jth  term  of  dAN'  - dBN'  from  (4.43)  becomes, 
(N-j)a0JaN+2l~l (a0N~J-aN~J)  > 0 V J (4.54) 

For  j even,  the  inequality  in  (4.54)  is  strict.  Thus,  for  odd  N, 
dAw'  - dBff'  > 0 (4.55) 


From  (4.52)  and  (4.55)  for  odd  N > 1, 


Aft'  - Bn ' m * * dAtf  * - dBfi ' > 0 

Lemma  4.8:  If  N is  a positive  integer  and  the  real  numbers  aQ 
are  such  that  0 < a0  < 1 and  -i  < a < -a0,  then,  an'  > bN ’• 
Proof: 

Inductive  proof  for  N even: 

At  N » 2,  from  (4.46) , 

A2'  ~ B2*  m fl+ia(a0+«)+a0a3; 

Let  b • -a,  a0  ■ Jlcb,  0 < k < 1.  Then 

[1  + 2a(aQ+a)  + aQa3;  - I - 2b2  (k-1)  - Ab4 

- <l-kbh)  + 2b2 (l~k)  > 0 

Or,  A2'  > B2* 

Assume  aN-2'  > bW-2 ' # W > 4 and  even . 

Consider  the  svm  of  the  jth  and  j+lst  terms  of 

(dA N‘ -dBN' ) + (^As-l’ ~dBN-l' ^ where  7 i*  even  and 
0 < j < N-2.  From  (4.44)  this  sum  may  be  expressed 
aQl+'alf+2J[a(N-j-l)(a0N’1-i-aN-J-h  ♦ (N-J-2)  (a0N-^2-aN~- 
+ a0JaN+23-2[a(N-j)  (a^-J-a*1-})  + (N-J-l)  (a0B~J~' -aP~J~l ) < 

- a0JaP*23~2 l (l+a*aQ) (N-j-1) (a0N~1~l-*P~l~* ) * a(N-J) (aQN' 

+ a2aQ(N-j-2) (a0N~l~2-aN~l~2) ] 

> a jaN+23~2(a(N-J)  (aB~l-aN~l)  + a2a(N-j-2)  (aN~3~2-aN~- 

o o 0 0 

- aQlaN+2J~2  [bP~J+*  (N-j)  (1-k N~3)  - ktP~J+l  (N-j-2) 

> 0 

When  j m N,  dAN‘  - dB^ ' ■ 0 
Therefore,  for  N even  and  > 4, 


(4.S6>  I 

and  a 

(4.57) 

(4.58) 

as, 

'~2;; 

■j-aN-i> 

]-2)i 

)i 

(4.59) 
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Thus  A 


■■ 


dAw'  + dAN~l ' > dBW*  + dBN-l' 
Or,  for  tf  even  and  i 2, 


(4.60) 


Take  N odd. 


At  W - 2,  Aj'  > flj'  by  (4.53) 


For  odd  N > 1,  consider  the  jth  and  J+lat  terms  of  dAN'  - dBN' , 

J even: 

(N-j)a0W+  2J-1  (aoN-j-aN-j}  + (aQN-j-l-aN-j-\ 

- a03aN+23~x  (l+k**~J)  - k(N-j-l)l/,~j+2(l-kN~j-'1)] 


(4.61) 


Theorem  4.3:  Assume  the  initial  condition  x^  is  an  unknown  parameter 
and  the  plant  is  scalar,  stable,  and  autonomous.  Then  in  the  limit  as 
the  measurement  noise  goes  to  zero,  the  likelihood  equation  for  the 
unknown  parameter  aQ  has  only  one  stable  root  and  that  root  is  aQ. 

Proof : 

If  h or  x^  (in  Equation  (4.26))  is  zero,  no  conclusion  on 
root  distribution  can  be  made.  Assume  that  h and  x are  not  zero. 

Consider  the  sign  of  AN‘  - BN’ , N positive  integer,  for 
a € (-1,1)  where: 


N N 


v • S £ 

v - £ J ^-v 


(4.62) 


(4.63) 


If  aQ  is  zero,  the  conclusion  is  immediate.  Take  aQ  € (0,1). 


a < 0 


(4.64) 


1.  ae  < a < I 

From  (4.53),  Ag'  - Bj1  - a0  - 

From  (4.43),  for  N > 1, 

dAff'  - d%*  - (N-j)aJJ,i‘2J~l(aN~J-J,~J)  < 0 ,0  < j < N-l 

(4.65) 

and  equals  zero  when  j » N . Thus 

%’  < V 

2.  a„  «*  a 

o 

From  (4.62)  and  (4.63),  Jty'  - %’  - 0 

3.  0 < a < aQ 

From  (4.64),  Aj ' - Bj ' > 0 
From  (4.43) , for  N > 1 

dAN'  - dBN • - (N-j)a0Ja*i+2j~l  (a0N~j-aN~l)  > 0 ,0  < j < N-l 
and  equals  zero  when  j * N.  Then  (4.66) 

an'  > aN' 

4.  a ■ 0 

From  (4.62),  (4.63), 

V - V - «o  > 0 

5.  -aQ  < a < 0 

By  Lemma  4.7,  Aw»  > Bw' 

6.  -1  < a < -ae 

By  Lemma  4.8,  Aw'  > BN ' 

The  above  follow  similarly  if  -J  < aQ  < 0.  | 

Little  more  can  be  said  about  the  random  polynomial  term  of  the 
likelihood  equation  (4.26)  than  could  be  said  when  the  known  case 
was  discussed.  Again,  the  expectation  of  the  random  polynomial  is  zero. 
When  the  initial  condition  xOQ  is  an  unknown  random  variable,  the 
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likelihood  equation  for  the  autonomous  plant  is  given  by  (3.48) : 


N 


o2[(o2+h2x2V)  Y*  ia2i_1  + h2e2 


N N 

s % 


ia2(i+j)-l 


+ h2x. 


N W 


AT  N 


- hx 


£ £ - S £ 

^ - (h2E2  ^ (Jc-i^a2-***-1  y^)  ( ^ a^i 


- 0 


(4.67) 

The  last  term  of  (4.67)  corresponds  to  the  xQO  unknown  parameter  case 
likelihood  equation  (4.26).  If  in  the  limit  as  n_£  goes  to  zero,  its 

a 

variance  o is  assumed  to  go  to  zero,  then  the  conclusions  for  the 
deterministic  portion  of  the  x^  unknown  parameter  likelihood  equation 
hold  for  the  deterministic  portion  of  (4.67).  However,  how  the  a2 
terms  affect  the  roots  of  the  likelihood  equation  when  o2  is  not  zero 
is  not  clear. 

The  scalar  autonomous  differencing  approach  likelihood  equation 
is  given  by  Equations  (3.70),  (3.71),  (3.73),  and  (3.50): 

-2  a2  ja2(i+j)-l  + ( it 

N N 

- -2a2  ja2(i+j)-l 

* 1 £ £ (,°iy°  * i,  *.JV' 


p-j 


(4.68) 


- 0 
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where : 


t */  V, * 


when  r - 0 and  - r\i+1  - «e)ni. 

If  a2  ■*  0 as  -*•  0,  then  the  conclusions  for  the  deterministic  portion 

of  the  xQQ  unknown  parameter  case  hold  for  this  case  if  yQ  j*  0. 

When  o2  » * 0 and  the  random  terms  are  considered,  the  deterministic 

conclusions  are  again  obscured.  In  this  case,  the  expectation  of  the 

S 

random  terms  is  not  zero.  Furthermore , the  term  £ a Kyk  can  have 

km0 

zeros  when  the  random  terms  are  considered.  Neglecting  the  a2  term, 
these  zeros  of  the  multiplicative  random  term  become  zeros  of  the  like- 
lihood equation  in  addition  to  the  stable  deterministic  root  however 
modified  by  the  additive  random  term. 


The  a2  term  can  be  written  ass 
-2a2  £5  Ja2 (i+j) _1  - -2o*(  £ a*h(  E 


(4.69) 


If  N > 1,  this  term  has  its  only  root  at  a * 0.  Its  effect  for  finite 
N and  small  o2  is  to  bias  the  stable  deterministic  root  toward  zero. 
Also,  since  this  term  is  bounded  on  (-1,1) , as  N increases,  the  product 
of  and  this  term  diminishes  to  zero. 

The  behavior  of  the  roots  of  the  likelihood  equations  in  the  inter- 
val -1  < a < 1 for  each  of  the  four  scalar  cases  when  the  forcing 
function  is  not  identically  zero  is  less  obvious.  In  an  earlier  sec- 

A 

tion,  the  fact  that  as  the  measurement  noise  goes  to  zero,  a approaches 
a0  was  established  for  forced  plants  in  all  cases  except  the  differen- 
cing approach  (because  only  the  autonomous  version  was  developed  here) . 


For  these  same  cases  without  the  limiting  condition  on  the  noise 


N+l  samples,  the  likelihood  equations  are  polynomials  of  odd  degree  if 


some  u £ 0 (not  including  ity-j)  whether  or  not  xQO  * 0.  Thus  the 

likelihood  equations  for  forced  plants  can  be  expected  to  have  at  least 


One  of  the  most  important  and  desirable  large  sample  characteris 


value.  Proofs  of  consistency  of  maximum  likelihood  estimators  are 


common  in  the  literature.  The  assumptions  on  which  the  proofs  are 


based  may  vary,  but  the  instance  is  relatively  rare  when  the  assumption 


of  independent  identically  distributed  samples  is  not  included 


Unfortunately,  the  samples  for  the  case  when  x is  known  or  xa  is 


unknown  parameter  are  not  identically  distributed.  In  the  x unknown 


Kendall  and  Stuart  [1961,  p.60]  present  a brief  general  discussion 


of  maximum  likelihood  estimation  when  the  samples  are  independent  but 


not  identically  distributed.  They  point  out  that  in  this  situation 


it  is  ho  longer  necessarily  true  that  ML  estimators  are  consistent  and 


the  ML  estimator  may  not  be  meaningful.  Thus  ML  estimators  in  non 


standard  situations  must  be  considered  individually 


When  the  initial  condition  x is  known  and  the  plant  is  scalar 


the  distribution  of  the  ith  sample  from  (3.7)  is 


pi(  y/») 


- 7 S?  “p'-^'v-'“lro  - “ £ *J'S-1;2'  J * 

(4.70) 

Assune  h,  2),  and  xe  not  zero  and  u^  not  identically  zero.  (If  instead, 
x0  m Of  neglect  p^  and  assume  u 0 ? 0 for  the  following  development.) 
Then  p^fy/a^  « Pi(yia2)  for  a.e.  y only  if  * a2.  Taking  the  limit 
of  pj  on  i gives: 

lim  pi(’y/a)  - p(y;a)  * ~/==f  exp/--^  (y  - f(a,u0,ulf...))2] 
i-*»  f 2tto^  2a^  w -1 

(4.71) 

The  function  / exists  and  is  continuous  on  the  interior  of  its  region 
of  convergence,  (-1,1),  if  the  are  assumed  to  be  uniformly  bounded, 
i.e.,  |u^|  <#<«*>,  and  a € (-1,1).  Since  f is  not  a constant, 

PCy/fij)  ■ pfy/a^)  for  a.e.  y only  if  a^  * a^. 

Let  the  subset  of  the  real  line  [-1,1]  be  denoted  by  CL  and 
assume  aQ,  the  true  value  of  the  unknown  parameter,  is  an  interior 
point  of  CL.  Since  CL  is  compact  and  p^  is  continuous  onCL»  there  exists 
a maximum  likelihood  estimator  of  aQ  based  on  N samples.  Denote  this 
estimator  by  a„. 

Let 


9i<y ,*)  * loq[pi(y,a)/pi(y,a0)) 

and 


(4.72) 


g(y,a)  - log[p(y,a)/p(y,a0)] 


(4.73) 

Both  g £ and  g are  integrable  for  all  a interior  to  CL  in  the  sense  that 
their  expectations  exist. 

The  following  theorem  is  an  extension  of  one  by  Jennrich  [1970] t 

for  independent  identically  distributed  random  variables. 

t Class  notes  in  Classical  Statistics,  Department  of  Mathematics, 
University  of  California,  Los  Angeles. 
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Theorem  4.3s  Under  the  above  assunptions , 


*„  ♦ a„  a. 8. 

« o 

Proof: 

Since  the  logarithm  is  a strictly  convex  function,  by  Jensen's 
inequality , 

fgi(y,a)pi(y,a0)du(y)  < log  !pi(y,a)du(y)  - 0 (4.74) 

with  equality  holding  only  if  a * a . Let  B be  a neighborhood 


of  a.  Then, 

sup  gi(y,a)^gi(yfa)  as  B+a. 
a €B 

Define  the  expectation  operator  E^  as 

Bi(.;  - f (Jpi(y,a0)dp(y) 

By  the  monotone  convergence  theorem 

Bi/sup  gily,o.)]^Eilgi(yra)]  as  B+a 
a£B 

and  therefore  there  exists  a B such  that 

E^/sup  gi(y,a)]  < 0 
a £B 


(4.75) 


(4  .76) 


(4,77) 


(4.78) 


whenever  a ? aQ.  Let  D CQ.be  a neighborhood  of  a , and  let 

c c 

D - Q.-D.  Because  CL  is  compact,  the  complement  D can  be 
covered  by  a finite  nunber  of  B neighborhoods,  and  thus 


l sup  g.( y,u);  < 0 
a €D 

Similarly, 

£7  sup  g(y,a)]  < 0 
a€Dc 


(4.79) 


(4.80) 


The  variance  of  g±(y,a)  depends  on  the  first,  second  and  fourth 
moments  of  y which  are  bounded  above.  Then  by  the  strong  law 

of  large  numbers,  for  a.e.  set  of  samples  {y^}  from  p^(y,aQ), 

1 - 
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N 

^ B±( supc  g^y, a)]  < 0 


(4.81) 


Choose  such  a sample  and  let  &N  - h^fy^.. . ,yN)  . By  definition 


of  the  ML  estimate, 


j \ jEj  J lo9 


1 , Pl(yl,aN)  m',pN(yS'aN) 


Pl(yl,ao)  "'PN(yN,ao) 


(4.82) 


Thus,  a^CP  for  sufficiently  large  iV.  Since  Z?  is  arbitrary, 

a 

aw  -*  aQ  a.s.  I 

When  the  system  is  autonomous,  the  above  proof  for  consistency 
does  not  hold.  Referring  to  the  limit  in  (4.71),  aixQ  0 as  i -*■  * 
for  a an  interior  point  of  CL.  The  uniqueness  of  the  density  with 
respect  to  a is  lost  in  the  limit. 

If  xQ  is  an  unknown  parameter,  the  above  proof  must  be  reworked 
with  the  unknown  parameter  as  a vector  instead  of  a scalar.  This 
appears  to  be  a natural  extension  of  the  theorem.  Using  a different 
approach,  Aoki  and  Yue  [1970]  have  shown  consistency  for  this  case. 

The  above  theorem  can  also  be  used  to  show  consistency  when  xQ  is 
an  unknown  random  variable.  The  proof  follows  through  directly  when 
the  densities  for  this  situation  are  conditioned  on  xQ.  Since  consis- 
tency exists  for  a.e.  xQ,  then  aN  -*■  aQ  a.s. 

In  the  final  case,  the  differencing  approach,  the  samples  are  not 
independent.  There  does  not  appear  to  be  any  simple  technique  to  get 
around  this  problem  as  there  was  when  xQ  was  an  unknown  random  variable. 


Wald  [1948]  and  Aoki  and  Yue,  however,  do  consider  the  problem  of 
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For  vector  samples  and  parameters  most  of  the  above  should  go 
through  with  perhaps  seme  additional  algebra.  Aokl  and  Yue  treat  the 


t 


I 

| 

* 

■ 


companion  matrix  case  when  xQ  is  an  unknown  parameter.  Mann  and  Wald 
[1943]  develop  the  companion  matrix  case  for  the  differencing  approach 
but  with  independent  samples. 

4.4  APPROXIMATIONS 

A characteristic  common  to  the  ML  estimators  in  all  four  cases 
considered  is  that  all  the  samples  must  be  saved  to  be  able  to  evaluate 
the  estimate  a^,  and  as  the  number  of  samples  N increases,  the  amount 
of  computation  involved  in  this  evaluation  increases.  This  situation 
is  inconsistent  with  the  requirement  of  real  time  identification.  The 
possibility  of  condensing  the  data  through  sufficient  statistics  was 
eliminated  earlier.  Exact  algebraic  factoring  appears  hopeless. 
Approaches  to  approximating  the  inverse  of  the  covariance  matrix  in  the 
differencing  approach  are  given  by  Cochrane  and  Orcutt  [1949] , Hannan 
[1960,  p.47]  and  Anderson  [1963] . None  of  these  appear  to  be  very 
satisfactory. 

The  approximations  with  most  appeal  involve  some  form  of  truncation 
of  the  likelihood  equation  polynomial.  The  simplest  approach  of  this 
nature  is  to  truncate  the  polynomials  after  some  arbitrary  number  of 
terms.  However,  this  limits  the  number  of  samples  that  can  be  used 
to  compute  the  estimate,  and  as  a result,  new  data  beyond  seme  point 
will  not  be  used.  Forgetting  for  the  moment  how  to  accomodate  initial 
condition  information,  for  systems  whose  parameters  are  in  fact  slowly 
varying  with  time,  the  truncated  polynomial  could  be  made  to  undergo  a 


continual  shift  in  indices  so  that  old  data  is  dropped  off  as  new  data 
cctnes  in.  If,  however,  use  of  all  the  data  is  desirable  as  would  be 


the  case  for  constant  parameters,  some  sort  of  averaging  scheme  can 
be  used  with  the  truncation.  This  latter  approach  is  pursued  in  what 
follows. 

There  are  two  obvious  types  of  averaging  modifications  that  could 
be  made  to  the  shifting  polynomial  scheme  just  described.  One  would 
be  to  define  a new  estimate  as  the  running  average  of  the  estimates 
from  the  shifting  polynomials.  Because  in  certain  situations  this 
estimate  tends  to  have  an  (infinite  variance)  Cauchy  distribution,  it 
does  not  appear  to  be  as  useful  as  an  alternative  scheme  which  keeps 
a running  average  of  each  of  the  coefficients  of  the  shifting  poly- 
nomials. The  latter  scheme  bases  estimates  on  the  truncated  poly- 
nomial evaluated  using  averaged  coefficients. 

Both  the  r known  and  the  unknown  random  variable  cases  use 
o o 

initial  condition  information  in  the  ML  estimate.  Use  of  this 
information  in  either  shifting  polynomial  scheme  generates  another 
growing  polynomial  required  to  shift  the  origin  thus  nullifying  the 
computational  advantage  gained  by  truncating.  The  coefficient  averaging 
scheme  for  the  xQ  unknown  parameter  case,  which  is  more  or  less  a 
steady  state  version  of  the  other  two,  will  be  assumed  to  apply  to  all 
three  cases. 


4.4.1  AVERAGE  COEFFICIENT  APPROXIMATIONS  TO  THE  LIKELIHOOD  EQUATIONS 
The  average  coefficient  approximation  equation  when  xD  is  an 
unknown  parameter  can  be  developed  from  Equation  (3.22) . The  number 


of  samples,  (N+l)  in  Equation  (3.22),  at  which  to  truncate  the 
dependent  observations . 


polynomials  is  arbitrary,  but  at  least  two  samples  must  be  used  over 
which  to  average.  Because  truncation  after  two  samples  yields  the 
simplest  result,  the  truncation  will  be  taken  at  that  point.  Thus  for 


two  samples,  the  likelihood  equation  becomes: 

yofy1-hbuo;  + [ (y^hbu^-y^Ja  - y0(y1-hbuQ)a2  - 0 (4.83) 

or  with  averaging  over  N + 1 samples  gives  the  average  coefficient 
expression  for  xQ  unknown  parameter  (as  well  as  xQ  known  and  xQ  unknown 


random  variable} . 


V‘2  - v*  - V * « 


where: 


V * s £ 

"»'  * ife  - £ li -I2) 


(4.84) 


(4.85) 


(4.86) 


For  the  same  situation  but  with  the  vector-valued  autonomous  system. 

Equations  (3.29)  and  (3.33)  give 

- HrR~lH  + ArHrR~lHA  (4.87) 

APj  - lHTR~ly1-HtR~lHA$1~lHTR~ly0-HTR~1HA4>1~1ATHTR~ly1] 

[y0rR~1H+y1TR~1HA]  = 0 (4.88) 

or,  averaging  over  N + 1 samples: 

/ N x / N v 

HrR~l  ^ 2l  *1*1-1*  jR~lli  + *1*1* 

- b'r-'ha*!-1™-1  ^ + (j$  r'1m] 

- HrR~1HA^1~1ArHrR~1  jjjj  ^ VlSi-l')*"1*  + (s  tfiyiT)  jrl“]  * 0 

(4  ..89) 
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Similarly,  for  the  differencing  approach,  (3.64)  gives* 

o2a3  + y0(y1-hbu0)az  + re2+y02-(y1-hbu0)z)a  - y^y^hbuj  - 0 

t4.90) 

or  treating  the  shifted  y0's  as  known  initial  conditions  and  averaging 
over  N + 1 samples  gives  the  average  coefficient  expression  for  the 
differencing  approach: 

a2a3  + Cw'a2  + to2  - D^Ja  - <V  - 0 (4'91) 

If  the  plant  is  vector-valued  and  H * I , the  identity  matrix,  then  from 
(3.60)  the  two  sample  average  coefficient  approximation  becomes: 

-**  * i ( JJ 

+ s te  (yi-Ayi_1-Bu._1)(yi~Ay._i-Bu._1)^(R+ARArrlAR  = 0 

(4.92) 

4.4.2  PROPERTIES  OF  THE  TWO-SAMPLE  AVERAGE  COEFFICIENT  APPROXIMATIONS 

The  finite  sample  properties  of  the  two-sample  average  coefficient 
approximations  to  the  scalar  plant  likelihood  equations  will  be 
investigated  first.  When  the  initial  condition  is  an  unknown  parameter, 
the  average  coefficient  equation  (4.82)  is  a quadratic  with  roots: 

«V  ± / (Dn')2  + 4(CN')2)/(2CN’)  (4 .-93) 

Two  conclusions  are  immediate.  The  two  roots  are  always  real.  By  the 
triangle  inequality,  one  root  lies  in  the  closed  interval  [-1,1] , and 
the  other  root  lies  outside  the  open  interval  (-1,1)  unless  an 

event  which  occurs  with  probability  zero.  In  that  case,  the  roots 
are  ±1. 

Earlier,  in  the  case  where  x is  an  unknown  parameter,  the  ML 
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estimate  & was  shown  to  approach  the  true  parameter  a0  as  the  measure- 
ment noise  goes  to  zero.  Since  this  property  holds  independent  of 
which  sample  in  the  sequence  of  samples  iy^}  is  denoted  yQ  (the  first 
sample  of  the  string  to  be  used  for  the  two-sample  ML  estimate) , the 
average  coefficient  approximation  (4.84)  for  any  N yields  8 * ac  for 
zero  noise. 

Furthermore,  the  expected  value  of  the  noise  terms  of  (4.84)  are 
easily  seen  to  be  zero.  In  CN' , the  noise  terms  are  weighted  sums  of 
ni  and  a sum  of  product  terms  of  the  form  Because  the  are 

independent  and  zero  mean,  the  expectation  of  each  term  in  the  sums  is 
zero.  In  DN' , the  noise  terms  are  also  weighted  suns  of  and,  in 
addition,  a telescoping  sum  of  square  terms  of  the  form  n^2.  The 
telescoping  sum  reduces  to  %2_h02»  whose  expectation  is  zero. 

This  approximation  (4.84)  is  related  to  the  ML  estimator  of  Levin 
[1964]  discussed  in  Chapter  2.  If  his  result  is  applied  as  each  new 
sample  is  made,  in  the  so-called  "overlapping"  mode,  instead  of  after 
collecting  groups  of  samples  as  intended,  his  result  becomes  identical 
to  (4.84) . 


The  finite  sample  properties  of  the  average  coefficient  approxi- 
mation to  the  differencing  approach  likelihood  equation  are  more 
difficult  to  establish.  Now,  the  equation,  (4.91),  is  a cubic.  An 
indication  of  the  root  location  cam  be  obtained  by  first  considering 
(4.91)  without  the  o2a3  term.  This  portion  of  the  equation  has  two 
real  roots  which  can  be  expressed  as: 


Again,  unless  D ' 
N 


[-(a2-DN')  ± /(o2-Dn>)*+4Cn 
= a2,  one  root  is  stable. 


,2]/(2Cn') 


(*.94) 


and  the  other  is  unstable. 
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When  Dn'  - o2,  the  roots  ere  ± 1. 

Closer  examination  of  (4.94)  shows  that  the  pair  of  roots  falls 
into  one  of  two  categories.  Either  the  stable  root  is  positive,  and 
the  unstable  root  is  negative  or  vice  versa.  In  the  former  case  when 
CN ' > 0 (and  o2~Dn'  > 0) , the  o2a3  term  has  the  effect  of  biasing  the 
stable  root  toward  sero  and  either  biasing  the  unstable  root  away  from 
sero  while  introducing  another  negative  unstable  root  or  merely 
removing the  unstable  root.  In  the  latter  case  when  CN'  > 0 (and 
°2-v  < 0) , the  o2a3  term  moves  the  unstable  positive  root  toward  sero 
to  the  point  where  it  becomes  stable  if  o2  > DN'/2.  Also,  if  o2  s 
the  stable  root  is  shifted  toward  -1,  and  a negative  unstable  root  is 
introduced.  If  a2  > DN' /2 , either  the  stable  root  disappears  or  it 
becomes  unstable  (and  negative)  and  a still  more  negative  root  is  added. 
Similar  conclusions  follow  for  C ' < 0. 

N 

Unless  o2  0 as  the  noise  goes  to  sero,  the  sero  noise  condition 
does  not  give  the  true  parameter  as  the  estimate.  As  was  the  case  with 
the  x0  unknown  parameter  approximation,  the  noise  terms  have  sero 
expectation. 

The  two-sample  average  coefficient  approximation  to  the  likelihood 
equation  for  the  differencing  approach,  Equation  (4.91),  could  have 
been  derived  in  two  other  ways  each  of  which  gives  further  insight  into 
the  nature  of  the  approximation.  In  one,  the  approximation  (4.91) 
follows  directly  from  the  scalar  autonomous  version  of  the  likelihood 
function  (3.64)  with  the  noise  covariance  P of  Equation  (3.66) 
approximated  ass 

P ■ a2  diag  (1+*2 . ,1+*2)  (4.95) 
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In  the  second,  • modified  version  of  the  likelihood  function  can  be 
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developed  using  grouped  samples  in  the  sense  of  Levin  by  basing  the 
likelihood  function  on  the  model  (3.50)  with  1*0 ,2 ,4,* • • . The  resulting 
likelihood  equation  is  then  used  in  an  overlapping  mode  by  resubscript- 
ing such  that  in  effect  i-0,2,***  once  again  as  in  (3.50). 

The  large  sample  properties  of  the  approximate  ML  estimate  in  the 
xQ  unknown  parameter  case  can  be  inferred  by  the  large  sample  character- 
istics of  its  "likelihood  equation",  Equation  (4.84). 

Assume  the  {u^}  are  uniformly  bounded  by  M>0  and  that  |a0|<l.  Then 
the  (x^)  are  uniformly  bounded  also.  From  Equation  (4.85) : 


1 N 

<**1-1  + 71 i-i)(haoxi-i  + V 

1 N 

■ A ^2-o*i-l2  + 1-1*  1-1  + T1i-ini^ 

^ (*,96) 

Examining  (4.96)  term  by  term: 

J t hZ*o*i-l2  " E ^oi-1Xo  + jEJ  ao1""1”^uJ-l^2  (4-97) 

(i>l) 


11m 


(aoZ 


ii-1 


0 


(4.98) 


;l  E E 

N-*~  N 1*2  JZl 
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By  the  strong  law  of  large  nunbers. 


1 » 

n £*  •*’ 0 

(4.101) 

Also, 

(4.102) 

k 

/i-i 

'iti 

(4.103) 

So, 

4£  Vm]  - " 

(4.104) 

and. 

4 if; 

(4.105) 

Then  by  Chebyshev’s  inequality,  for 

any  e » 0, 

(4.106) 

or 

(4.107) 

A 

stronger  result  than  (4.107) 

can  be  shown  by  using 

a theorem  of 

Revesz 

[1968,  p.  87],  By  (4.102)  and  (4.103)  and  since. 

± W<  < o* 

f l0|  ^dir  » o4r(3)  < 
F1  * 

„ (4.108) 

then. 

2 I 

J 5 Vw  * » »••• 

(4.109) 
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From  (4.86)  , 


v * »[s  * "i>2 ■ Ik  "“i-j  * 


(4 . 110) 


Using  (4.97)  - (4.100), 
, N 


h2(*02-l)  lim  £ £ iri-j2  - -*2*2  ^ 
I*i  -i_ac 


Since, 


L (t\  2 - t\  2)  0 

N 1"n  'o  ' 


then  by  (4.111)  and  (4.101), 

-h2M2  1+*a  S-lo,.'  | < 0 
J-«o 

and  from  (4.98),  (4.99),  (4.100),  (4.101),  and  (4.109), 


(4.111) 


(4.112) 


(4.115) 


0 < |C„*|  < h2M2  *<?  , (4‘114) 

d-«o )2 

At  this  stage  all  that  has  been  shown  is  .that  if  (u^)  uniformly 


bounded  and  |a0|  < 1,  the  average  coefficient  approximation  likelihood 
equation  (4.84)  does  approach  seme  limit.  As  a special  case,  consider 
ui  = Af  > 0.  Then 


C ' - h2M2  and  D'  - -h2M2 

(l-*0) 2 i‘*o 


(4.115) 


The  two  limiting  roots  of  (4.84)  are  found  by  introducing  (4.115)  into 
(4.93) , 

DJ  i / (Dj)2+4(Cm')2 

57 (4.U6) 

The  two  roots  are  - — and  a„. 

ao 

In  the  average  coefficient  approximation  for  the  autonomous 
differencing  approach,  Equation  (4.91) , CN'  and  D^'  both  go  to  sero 
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SECTION  V 

NUMERICAL  CONSIDERATIONS  OF  THE  IDENTIFIERS 

5.1  INTRODUCTION 

The  objectives  of  the  discussions  in  this  chapter  are  to  illustrate 
some  of  the  properties  of  the  identifiers  which  were  established  by 
theorems  in  the  previous  chapter  and  to  investigate  the  mechanics  for 
the  numerical  evaluation  of  the  parameter  estimates  frcm  the  nonlinear 
likelihood  equations.  To  this  end  the  computer  simulation  results, 
grouped  according  to  the  four  initial  condition  categories  considered 
in  the  preceding  chapters,  are  presented  first.  The  results  and  related 
computational  implications  are  then  explored. 

The  terms  "cost"  and  "cost  function"  used  in  this  chapter,  with 
some  minor  modifications  detailed  in  the  following  section,  were  defined 
in  Chapter  3.  Basically,  they  refer  to  the  function  formed  by  taking 
the  natural  logarithm  of  the  likelihood  function  and  then  discarding 
the  additive  terms  and  common  factors  which  do  not  depend  on  the 
unknown  parameter.  Terms  of  the  type  "derivative  of  the  cost  function" 
and  "derivative  function"  refer  to  the  function  obtained  by  differenti- 
ating the  cost  function  with  respect  to  the  unknown  parameter.  (The 
equation  which  results  by  setting  that  derivative  function  equal  to 
zero  is  the  likelihood  equation.) 


5.2  SIMULATION  RESULTS 

The  figures  in  this  section  are  based  on  computations  and  noise 
generated  on  the  IBM  360,  model  91  and  were  prepared  on  a Cal-Comp 
plotter.  Results  are  given  for  both  the  scalar  model  and  the  multi- 
dimensional models. 
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5.2.1  SCAUR  MODEL  RESULTS 

The  scaler  nodal  is  defined  by  Equation  (3.6) v which  is  repeated 
below  for  convenience. 


Vi  • “j  * “i 
»i  - tei  * "i 


i ■ 0 tl 


(S.l) 


(In  the  text,  in  order  to  distinguish  between  the  parameter  a and  the 
true  value  of  a in  (5.1) , the  true  value  of  a is  denoted  by  ac.)  The 
noise  sequence  {r^}  was  taken  as  gaussian  independent  identically 
distributed,  each  member  with  distribution: 

hi  ~T7(0,o2;  (5.2) 

The  known  coefficients  were  assuned  to  be  unity,  i.e., 

h - b - 1 (5.3) 

(For  purposes  of  exercising  the  identification  schemes,  various 
assinptions  about  the  initial  conditions  were  made,  but  these  did  not 
affect  the  model.) 

Some  theory  on  optimal  input  selection  for  identification  exists, 
e.g.,  Staley  (1968).  However,  since  the  normal  operating  input  restric- 
tion was  assumed  for  this  study,  no  optimisation  was  attempted. 

Instead,  for  simplicity  a step  input,  « constant,  was  used. 

The  four  identification  problems  discussed  in  Chapter  3 were  simu- 
lated. The  first  three  differ  strictly  by  assumptions  on  the  nature 
of  the  initial  condition  xQ,  i.e.,  xQ  known,  xq  unknown  parameter,  or 
xQ  unknown  random  variable.  The  fourth  identification  problem,  the 
differencing  approach,  actually  is  based  on  a model,  (3.49),  which 
differs  somewhat  from  (5.1).  By  differencing  the  measurements  yj 
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in  (5.1)  the  model*  become  equivalent  except  that  in  the  latter  the 
initial  meaaurement  ia  considered  as  a constant.  (The  differencing 
approach  most  closely  corresponds  to  the  xQ  unknown  parameter  case.) 

The  equations  for  the  four  identification  problems  that  were 
simulated  are  given  in  Chapter  3.  For  the  case  where  the  initial 
condition  *o  is  known  and  the  system  is  scalar,  the  cost  function  is 
given  by  (3.8) , and  the  derivative  function  is  given  by  the  left  side 
of  Equation  (3.9) . 

The  case  where  xc  is  an  unknown  parameter  requires  a bit  of  dis- 
cussion in  order  to  maintain  reasonable  consistency  in  the  terminology. 
The  difficulty  arises  because  there  are  two  unknown  parameters,  *0  and 
a.  Since  estimation  of  a is  of  primary  interest  the  second  parameter, 

xQ,  was  eliminated  through  using  * instead  of  x in  the  cost  and 

o o 

derivative  functions.  Thus  the  cost  function  is  given  by  Equation 
(3.18)  but  with  xQ  replaced  by  »Q  of  Equation  (3.20).  The  derivative 
function  is  given  by  the  left  side  of  Equation  (3.21)  with  xQ  replaced 
fay  *0  ot  Equation  (3.20).  (When  the  derivative  is  set  equal  to  zero 
it  is  equivalent  to  Equation  (3.22),  the  equation  for  A.  Strictly 
speaking,  the  likelihood  equation  is  neither  of  these  but  is  the  pair 

4 

of  Equations  (3.19)  and  (3.21).)  ^ 

For  the  case  where  x is  a gaussian  random  variable  with  known 

o 

mean  xQ  and  variance  e2,  the  cost  function  which  was  simulated  is 
given  by  (3.41)  normalised  with  respect  to  a2.  The  derivative  function 
used  in  the  simulation  is  given  by  the  left  side  of  (3.44).  However, 
in  order  that  the  derivative  and  cost  functions  correspond,  the 
derivative  must  be  divided  by  (a2  + h2e2araj2. 
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The  differencing  approach  cost  function  was  not  sinulated.  The 
left  side  of  the  Equation  (3.70)  was  taken  as  the  derivative  function 


(Only  the  autonomous  version  was  simulated.) 

Because  the  nunber  of  figures  is  relatively  large,  the  figure 

nvsabers  are  coded  to  help  identify  the  situation  the  associated  figure 

represents.  The  nunerical  designation  1 through  4 corresponds  to  x° 

known,  xQ  unknown  parameter,  x0  unknown  random  variable,  and  the 

differencing  approach,  respectively.  The  letter  designations  a through 

j correspond  to  the  various  situations  which  were  simulated  as  described 

below.  Again,  because  of  the  nunber  of  figures  and  the  varying  amounts 

of  new  information  introduced  by  them,  not  all  the  figures  that  were 

developed  have  been  included.  This  accounts  for  what  appear  to  be 

gaps  in  the  literal  numbering  sequences. 

The  first  three  groups  of  figures  are  an  illustration  of  the 

evolution  of  the  MLE  (maximum  likelihood  estimate)  of  aQ  for  specific, 

but  arbitrary  realisations  of  the  measurement  noise  sequence  }. 

Also,  comparison  of  the  MLE  of  to  the  estimate  of  a by  other 

o o 

schemes  (described  later)  - namely,  least  squares  (LSQ) , 3-point 
recursive  fit  (3PT) , and  average  coefficient  (AVC)  are  shown.  These 
figures  are  shown  first  to  unify  the  later  ones  which  concentrate  more 
on  the  immediate  issue  - the  solution  for  I for  a given  number  of 

N 

samples . 

a.  Comparison  of  ML,  least  squares,  average  coefficient,  3-point 
• recursive  fit  estimation  schemes:  Figures  5-la,  5-2a,  5-3a, 
and  5-4a.  These  figures  correspond  to  xq  known,  xq  unknown 
parameter,  xq  random  variable,  and  differencing  approach. 


11 
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respectively.  For  ell  four  f'^uree,  « » -0.5,  x « -3.0,  and 

o o 

Uj  = 1*0  except  in  5-4*  where  £ 0 end  xQ  - 6.0.  The  level 
of  the  measurement  noise  is  relatively  low  with  a2  - 0.01.  The 
increscent  for  the  1-point  recursive  fit  is  0.1  with  points 
located  at  a ■ -0.65,  -0.55,  and  -0.45.  The  maximum  likelihood 
estimates  were  found  by  regula  falsi  iterative  solution  for  the 
roots  of  the  likelihood  equations. 

The  maximum  number  of  samples  shown  is  30.  Most  curves  were 
computed  up  through  60  samples.  The  behavior  of  the  curves  for 
the  second  30  samples  was  similar  to  that  of  the  first  30 
samples,  except  for  the  first  few  samples. 

In  Figure  5-3a,  the  initial  condition  mean  and  variance  are 

*Q  m -3.0  and  e2  » 0.02.  (Another  simulation  was  made,  not 

included  here,  with  conditions  identical  to  those  of  Figure 

5-3a  except  that  x ' - 6.0.  The  fact  that  the  true  initial 
o 

condition  and  the  mean  initial  condition  were  grossly  mismatched 
in  terms  of  the  variance  e2  resulted  in  a transient  in  the  MLE 
of  aQ  for  the  first  few  samples.) 
b.  Comparison  of  ML,  least  squares,  average  coefficient,  3-point 
recursive  fit  estimation  schemes : Figure  5-lb.  This  group  is 
computed  under  the  same  conditions  as  these  in  group  (a)  except 
that  a0  m 0.75  and  *Q  m XD  m 2.0  (*Q  * 0.6  for  differencing 
approach),  and  the  3-point  fit  was  made  at  a * 0.6,  0.7,  and 
0.8. 

Figures  fox  xq  unknown  parameter  and  xQ  unknown  random 
variable  are  not  included,  but  they  appear  very  similar  to  the 
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*c  known  case.  Figure  5-lb,  much  as  was  the  situation  in  group 
(a) . The  figure  for  the  differencing  approach  also  is  not 
included  but  initially  resembles  Figure  5-lb  and  then  settles 
down  as  in  Figure  5-4a  except  the  average  coefficient  and  least 
squares  solutions  drift  at  a greater  rate. 

Both  for  this  group  and  group  (a)  with  the  exception  of 
differencing  approach,  another  variation  of  the  3-point  recur- 
sive fit  was  computed  but  not  included  among  the  figures.  The 
increment  for  the  fit  was  reset  to  0.2  from  0.1  with  points  at 
a - -0.8,  -0.6  and  -0.4  for  aQ  - -0.5  and  at  a - 0.45,  0.65, 
and  0.85  for  aQ  - 0.75.  This  increase  in  the  point  separation 
resulted  in  estimates  which  differed  from  the  true  parameter 
value  by  about  10%  after  30  samples. 
c.  Comparison  of  ML,  least  squares,  average  coefficient,  3-point 
recursive  fit  estimation  schemes s Figure  5-lc.  This  group  is 
computed  under  the  same  conditions  as  those  in  group  (a)  except 
o2  ” 0.4356  and  the  number  of  samples  is  extended  to  60. 

Again,  figures  for  xq  unknown  parameter  and  xQ  unknown 
random  variable  are  not  included  but  appear  very  similar  to 
Figure  5-lc.  The  differencing  approach  was  not  simulated  for 
this  set  of  conditions. 

The  measurement  noise  variance  was  increased  over  that  in 
group  (a)  to  observe  the  effectiveness  of  the  schemes  when 
operating  under  moderate  noise  levels.  The  value  of  0.4356  for 
a*  was  selected  to  make  the  one-o  value  of  the  noise  (approx- 
imately) equal  to  2/3,  the  limiting  value  of  x^  with  = 1. 
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Of  prime  importance  in  the  numerical  solution  for  the  MLE  of  aQ 
is  the  expected  shape  of  the  derivative  function  and  the  root  distribu- 
tion. Preliminary  to  displaying  various  examples  of  derivative  func- 
tions, the  cost  function  is  presented  in  order  that  source  of  the  sharp 
fluctuations  in  the  derivative  curves  be  better  understood. 

d.  Cost  functions  for  the  MLE:  Figure  5-ld.  For  this  group, 

aQ  * -0.5,  xQ  ■ -3.0,  a2  » 0.01  and  = 1.0.  (The  cost  func- 
tion for  the  differencing  approach  was  not  evaluated.)  In  the 

case  where  x was  assumed  to  be  a random  variable,  x - -3.0 
o o 

and  e2  * 0.02.  The  curves  are  given  for  3,  5,  10,  15,  and  20 
samples  and  -1.5  < a < 1.5.  Though  not  necessarily  very  similar 
overall,  the  curves  for  xQ  unknown  parameter  and  xQ  random 
variable  do  exhibit  the  essential  feature  of  Figure  5-ld,  the 
oscillation  at  just  beyond  a * 1.  The  figures  for  these  two 
cases  are  not  included. 

The  derivative  function  curves  for  various  parameter  values  and 
noise  levels  are  included  in  the  next  three  groups. 

e.  Derivative  functions  for  the  MLE:  Figure  5-le.  For  this  group, 

a = -0.5,  x = -3.0,  c2  * 0.01,  and  u.  = 1.0  (except  in  the 
o o i 

differencing  approach  where  = 0 and  xQ  * 6.0) . When  xQ  is 

taken  as  a random  variable,  x = -3.0  and  e2  = 0.02.  The  curves 

o 

are  computed  for  3,  5,  10,  15,  and  20  awples  and  -1.5  <_  a <_1.5. 

Over  the  range  displayed,  for  the  scale  employed  the  curves 
for  the  four  ML  estimators  are  virtually  identical  to  the  no- 
noise curves  of  group  (h) . (They,  of  course,  are  not  identical 
as  earn  easily  be  seen  from  the  curves  of  group  (a).)  The  xq 
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known  case.  Figure  5-le,  is  presented  to  illustrate  the  simi- 
larity. 

f.  Derivative  functions  for  the  MLB:  Figures  5 -If,  5-2 f,  5-3 f, 
and  5-4 f.  The  conditions  for  this  group  are  the  same  as  those 
of  group  (e)  except  ac  - 0.75  and  xQ  « xQ  » 2.0  (or  0.6  for  the 
differencing  approach.  Figure  5-3f) . All  cases  are  shown. 

g.  Derivative  functions  for  the  NI£:  Figures  5-lg,  5-2g,  and  5-3g. 
This  group  is  identical  to  group  (e)  except  the  noise  level  was 
raised  by  increasing  the  measurement  noise  variance  from 

a2  » 0.01  to  o2  » 1.0.  The  derivative  function  for  the  differ- 
encing approach  was  not  computed. 

The  curves  for  all  of  the  above  groups  of  course  represent  re- 
sponses to  only  one  possible  realization  of  the  measurement  noise 
sequence.  Just  how  typical  they  are  is  difficult  to  establish,  espe- 
cially for  small  numbers  of  samples.  To  provide  a deterministic 
reference  for  the  derivative  function  curves,  limiting  versions  were 
computed  ’here  a2  -*■  0 and  0. 

h.  No  noise  derivative  functions  for  the  MLE:  Figures  5-lh,  5-2h, 

5 -3h,  and  5-4h.  The  conditions  for  this  group  are  the  same  as 
for  group  (e)or  (g)  except  a2  = * 0. 

i.  No  noise  derivative  functions  for  the  MLB:  Figure  5-4i.  Only 
the  differencing  approach  is  presented.  (The  reason  for 
including  this  figure  is  to  provide  a contrast  with  Figure  5-4f 
similar  to  what  exists  between  groups  (g)  and  (h)  for  the  three 
other  ML  estimators.)  The  conditions  in  this  group  are  the 
same  as  those  in  group  (f)  except  that  a2  * r\^  - 0. 
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In  another  attempt  to  overcome  the  Inconcluaiveness  of  single 
realizations,  a Monte  Carlo  approach  was  used  wherein  a random  set  of 
measurement  noise  sequences  was  generated.  The  simulation  of  the  model 
was  repeated  using  each  of  the  noise  sequences  in  turn.  Based  on  the 
accumulated  data  from  the  set  of  experiments,  4 frequency  distributions 
were  made.  With  these,  some  insight  into  derivative  function  root 
distribution  and  convergence  of  the  estimate  can  be  derived. 

j.  Frequency  distributions  for  the  MLE  of  aQ:  Figures  5-1 J,  5-2 j, 
5-3 j,  and  5-4 j.  For  this  group,  aQ  » 0.75,  o2  ■ 0.01, 
xQ  - 2.0,  xQ  ■ 2.0,  e2  “ 0.02,  and  = 1.0  (except  for  5-4j 
where  xQ  « 0.6  and  = 0) . The  low  noise  combination  of 
aQ  • 0.75  and  c2  - 0.01  was  chosen  to  help  insure  rapid  con- 
vergence primarily  for  economic  reasons. 

In  Figures  5-1 j , 5-2 j , and  5-3 j , 4 distributions  are  given 
for  3,  10,  and  60  samples  based  on  100  experiments  with  81 
frequencies  corresponding  to  steps  in  4 of  0.002  where 
0.675  < 4 < 0.B37.  In  Figure  5-4 j , 4 distributions  are  given 
for  3,  10,  and  30  samples  based  on  50  experiments  with  15 
frequencies  corresponding  to  steps  in  4 of  0.04  where 
0.45  < 4 < 1.05. 

Since  x0  was  constant  for  all  the  experiments,  Figure  5-3 j 
for  the  xQ  random  variable  case  should  be  viewed  as  distribu- 
tions conditioned  on  x . Also,  y was  simulated  as  a random 

o o 

variable  for  axl  cases. 
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Figure  5-la.  Comparison  of  ML,  least  squares,  average 
coefficient,  and  3-point  fit  estimation  for  *0  known 
case.  (ao— 0.5,  o2-0.01) . 
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NUMBER  OF  SAMPLES 

Figure  5-lb.  Comparison  of  ML,  least  squares,  average 
coefficient,  and  3-point  fit  estimation  for  xQ  known 
case.  (a  -0.75,  o2-O.01). 
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Figure  5-ld.  Cost  function  of  the  MLB  in  the  xQ  known  case 
for  3,  5,  10,  15,  and  20  samplea.  (aQ— 0.5,  o2-0.01) . 
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Figure  5-le.  Derivative  function  of  the  MLE  in  the  xQ  known 
case  for  3,  5,  10,  15,  and  20  samples.  (ao»-0.5,  a2*0.01) . 
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Figure  5-1 j.  Frequency  distribution  for  MLE  of  a0  in 
x0  known  case.  lao~0.75,  o2«O.01)  . 
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Figure  5-2a.  Comparison  of  ML,  least  squares,  average 
coefficient,  and  3-point  fit  estimation  for  xQ  unknown 
parameter.  (an— 0.5,  o2-0.01)  . 
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PARAMETER  VARIABLE  A 


Figure  5-2g.  Derivative  function  of  the  MLE  in  the  xD 
unknown  parameter  case  for  3,  5,  10,  15,  and  20 
samples.  (ao—0.5,  o2-1.0) . 


PARAMETER  VARIABLE  A 

Figure  5-2h.  No  noise  derivative  function  of  the  MLE 
in  the  xQ  unknown  parameter  case  for  3,  5,  10,  15, 
and  20  samples,  (a  >-0.5,  o2>0) . 
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PARAMETER  VARIABLE  A 

Figure  5-3f.  Derivative  function  of  the  MLE  in  the 
xQ  random  variable  case  for  3,  5,  10,  15,  and  20 
samples.  (aQ“0. 75,  cr2-0.01). 
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xQ  random  variable  case  for  3,  5,  10,  15,  and  20 
samples.  (aD— 0.5,  oVl.O). 
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PARAMETER  VARIABLE  A 


Figure  5-3h.  No  noise  derivative  function  of  the  MLE 
in  the  xQ  random  variable  case  for  3,  5,  10,  15, 
and  20  samples.  (ao»-0.5,  cr2-0) . 
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Figure  5-3j.  Frequency  distribution  for  MLE  of  ac  in 
xQ  random  variable  case.  (ao«0.75,  o2-0.0I) . 
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Figure  5-4a.  Comparison  of  ML,  least  squares,  average 
coefficient,  and  3-point  fit  estimation  for 
differencing  approach.  (ao«-0.5,  a2~0.01) . 
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Figure  5-4 j.  Frequency  distribution  for  MLE  of  aQ 
differencing  approach.  (aQ«0. 75,  o2»0. 02) . 
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5.2.2  MULTIDIMENSIONAL  MODEL  RESULTS 


The  model  in  the  multidimensional  case  is  given  by  Equation  (3.5), 

i.e. , 

Xi+I  " toi  + 

y..  = Hxi  + ni  i * 0,1,...  ,N  (5.4) 

(The  true  value  of  A in  (5.4)  will  be  denoted  by  A in  order  to 

o 

distinguish  it  from  the  parameter  A.) 

The  vector  noise  sequence  {r^l  is  assumed  to  be  a gaussian, 
independent,  identically  distributed  random  sequence,  each  member 
having  the  distribution: 

r\i  ^7)(0,R)  (5.5) 

The  known  coefficients  were  assumed  to  be  identity  matrices,  i.e., 

H - B - I (5.6) 

Only  two  of  the  original  four  identification  problems  were 
simulated  in  multidimensional  case  - xQ  known  and  the  autonomous 
version  of  xQ  as  an  unknown  parameter.  When  the  initial  condition 
xQ  is  known,  the  cost  function  is  given  by  Equation  (3.12) , and  the 
derivative  function  is  given  by  left  side  of  Equation  (3.15) . For 
the  case  where  xQ  is  an  unknown  parameter,  the  cost  function  (with  xq 
as  a parameter)  is  given  by  Equation  (3.26) , and  the  derivative  func- 
tion is  given  by  the  left  side  of  Equation  (3.33) , where  xq  has  been 
replaced  by  iQ. 

Because  of  dimensionality  problems,  several  measures  were  taken 
in  order  to  facilitate  the  displaying  of  cost  and  derivative  functions 
over  a range  of  parameter  values.  In  order  that  n x n matrix  parameter 
argument  a of  the  functions  be  represented  by  a scalar,  the  cost 
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functions  and  derivative  functions  were  evaluated  only  along  a vector 
in  n2  dimensional  space  passing  through  AQ.  The  derivative  functions 
are  also  n x n matrices.  Two  scalar  representations  of  these  were 
computed.  One  is  the  derivative  norm  which  is  the  sum  of  the  mag- 
nitudes of  the  elements  of  the  derivative  matrix.  (This  norm  is 
related  to  one  more  commonly  used,  the  element  with  largest  magnitude. 
The  latter  is  bounded  by  the  former  and  the  former  * n2.)  The  second 
scalar  representation,  the  directional  derivative,  corresponds  to  the 
inner  product  of  the  derivative  represented  as  a vector,  i.e.,  the 


I 


Figures  5-5a,  5-5b,  and  5-5c  display  the  cost  function,  norm  of 
the  derivative,  and  directional  derivative,  respectively,  along  A 

o 

for  3,  5,  10,  and  15  samples  (and  20  samples  for  the  cost  function) 

in  the  case  where  x is  known.  Figures  5-6a,  5 -6b,  and  5-6c  give  the 
9 

cost  function,  norm  of  the  derivative,  and  directional  derivative, 
respectively,  along  AQ  for  3,  5,  9,  13,  and  18  samples  in  the  case 
where  xq  is  an  unknown  parameter.  However,  the  cost  function  in  this 
latter  case  is  not  entirely  consistent  with  the  derivative  plots 
because  in  computing  the  cost  curves  the  true  xQ  was  used  instead 
of  (see  Equation  (3.30)). 

The  figures  for  the  multiparameter  case  follow. 
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Figure  5-5b.  Norm  of  derivative  function  along  AQ 
in  x0  known  case  for  3,  5,  10,  and  15  samples. 
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Figure  5-6a.  Cost  function  along  Aa  in  x0  unknown  parameter 
case  for  true  xQ  and  3,  5,  9,  13,  and  18  samples. 
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Figure  5-6b.  Norm  of  the  derivative  function  along  Aq 
in  xQ  unknown  parameter  case  for  xc  - xQ  and  3, 

5,  9,  13,  and  18  samples. 
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Figure  5-6c.  Directional  derivative  along  A0 
in  xQ  unknown  parameter  case  for  xQ  * xQ 


and  3,  5,  9,  13,  and  18  samples. 


5.3  DISCUSSION  OF  KLE  SIMULATION  RESULTS 


5.3.1  SCALAR  MODEL 

An  important  factor  in  the  selection  of  numerical  techniques  to 
locate  the  roots  of  the  likelihood  equations  is  the  behavior  of  the 
associated  derivative  function  in  the  neighborhood  of  the  desired  root 
or  better  yet,  over  some  area  within  which  the  root  is  nearly  certain 
to  lie.  The  no-noise  derivative  functions  of  groups  (h)  and  (i)  of 
the  previous  section  provide  some  deterministic  information  on  behavior 
but  only  in  the  limit  as  the  noise  variance  goes  to  zero.  However,  the 
question  of  sensitivity  of  the  behavior  of  the  (polynomial)  derivative 
functions  to  noise,  i.e.,  to  perturbations  in  its  coefficients,  must 
be  answered  before  the  usefulness  of  no-noise  information  in  this 
regard  can  be  assessed.  Lacking  well  established  conclusions  on 
sensitivity,  the  investigation  of  derivative  function  behavior  cannot 
be  limited  to  the  no-noise  results. 

The  figures  in  groups  (e) , (f) , and  (g)  provide  some  insight  into 
derivative  function  behavior.  (Because  this  study  is  concerned  with 
stable  systems,  the  figures  display  essentially  only  the  stable  range 
of  the  parameter  a.)  When  a,  aQ  £ (-1,1) , the  derivative  function 
generally  appears  very  smooth.  In  fact,  for  aQ  = -0.5,  except  when  xq 
is  an  unknown  parameter,  Figures  5-le,  5-lg,  5-3gr  5-4g  indicate  that 
the  derivative  function  is  basically  monotone.  (This  probably  would 
also  have  been  the  case  when  is  an  unknown  parameter  had  Equation 
(3.22)  been  used  in  the  simulation  instead  of  (3.21)  and  (3.20).) 

The  smoothness  continues  for  parameter  values  somewhat  less  than  -1, 
but  for  those  somewhat  greater  than  +1,  sharp  fluctuations  begin  to 


137 


r 


appear  in  groups  (e) , (f) , and  (g)  as  the  number  of  samples  increases. 
The  derivative  functions  for  the  differencing  approach,  Figures  5-4f 
and  5-4h,  do  not  exhibit  the  fluctuations  near  a = I.  (This  may  be 
related  to  the  fact  that  the  model  is  autonomous  for  the  differencing 
approach  simulations  while  for  the  three  other  cases,  = 1.0.)  The 
derivative  fluctuations  result  from  the  dips  that  occur  in  the  cost 
function  just  beyond  a = 1 as  illustrated  by  Figure  5-le.  (No  attempt 
has  been  made  to  ncover  any  physical  explanation  for  the  dips.) 

Comparing  group  (f)  in  which  a - 0.75  to  group  (g)  (or  group  (e) ) 

o 

in  which  aQ  = -0.5,  demonstrates  that  at  least  locally,  the  behavior 

of  the  derivative  functions  is  significantly  affected  by  aQ.  In  group 

(f)  the  curves  exhibit  a bump  adjacent  to  the  root  while  for  aQ  « -0.5 

except  for  the  case  where  xQ  is  an  unknown  parameter,  the  curves 

tend  to  be  monotone  in  form.  This  characteristic  greatly  detracts  from 

the  general  applicability  of  Newton  type  numerical  methods  for  root 

evaluations  for  stable  a . 

o 

The  effect  of  noise  level  is  less  than  might  have  been  expected. 

As  pointed  out  in  the  previous  section,  for  relatively  low  noise  level 
(o2  = 0.01)  and  the  scales  selected,  the  resulting  curves  (group  (f)) 
are  practically  indistinguishable  from  the  corresponding  no-noise 
curves  of  group  (h) . (This  conclusion  is  less  accurate  for  the  differ- 
encing approach  when  a =0.75  as  shown  by  Figures  5-4f  and  5-4i,  and, 

o 

of  course,  is  inaccurate  on  a magnified  scale  as  demonstrated  by  the 
curves  of  group  (a).)  Even  for  moderate  noise  levels  (a2  = 1.0, 

= 2/3)  , group  (g)  indicates  the  derivative  functions  undergo  only 
relatively  minor  changes  for  stable  a.  (However,  the  spread  of  A does 
appear  to  change  noticeably.) 


t 


f 
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As  the  number  of  samples  Increases,  the  magnitude  of  the  slope 

generally  increases  with  both  roots  and  peaks  becoming  sharper.  In  the 

neighborhood  of  the  root  A,  in  particular,  the  slope  increase  indicates 

a lowering  sensitivity  of  A to  noise  perturbations.  (See  Section  5.5.) 

While  this  trend  probably  continues  indefinitely,  it  likewise  probably 

does  not  continue  without  bound  for  stable  systems  and  a € (-1,1) 

because  the  derivative  functions  are  polynomials  in  a.  Since  the 

measurement  noise  has  a gaussian  distribution,  the  y and  thus  the 

slope  at  A cannot  be  bounded  in  a deterministic  sense.  Still,  the 

slope  is  most  likely  bounded  in  probability  if  not  with  probability  one 

Because  the  derivative  functions  are  polynomials,  the  possibility 

of  multiple  real  roots  must  be  faced.  The  question  of  multiple  roots 

takes  on  added  importance  as  the  mmber  of  samples  increases  because 

the  degree  of  the  derivative  function  polynomial  grows  with  the  nimber 

of  measurements.  The  figures  of  groups  (e) , (f) , and  (g)  verify 

th».t  multiple  real  roots  can  occur,  but  in  none  of  the  figures  does 

©re  than  one  root  fall  in  the  interval  (-1,1).  Furthermore,  in  the 

curves  of  groups  (a) , (b) , and  (c) , in  the  derivative  functions  curves 

of  groups  (e) , (f) , and  (g)  and  in  the  frequency  histograms  of  group 

(j) , A is  always  stable  (except  for  one  instance  with  only  three 

samples  in  the  differencing  approach  frequency  distribution  of  Figure 

5-4 j) . There,  of  course,  is  nothing  preventing  a from  taking  on 

unstable  values  if  a is  stable.  All  that  can  be  concluded  from  the 

o 

abov*  is  that  at  least  up  to  moderate  noise  levels,  the  spread  in  A 


distribution  is  relatively  small. 


J 


The  no-noise  derivative  functions  curves  of  Figures  5-lh,  5-2h, 
5-3h,  5-4h,  and  5-4i  provide  a reference  for  gauging  the  effect  of 
measurement  noise  on  the  shape  of  the  derivative  functions.  Also,  they 
demonstrate  that  in  the  limit  as  noise  level  goes  to  zero,  £ -*■  aQ  and 
£ is  the  only  stable  root.  Both  these  situations  were  predicted  by  the 
theoretical  discussions  of  Chapter  4 except  that  the  proof  of  the 
latter  condition  was  limited  to  autonomous  models  of  which  only  Figures 
5-4h  and  5-4i  are  examples. 

The  St  curves  of  groups  (a)  , (b)  , and  (c)  are  examples  of  the 

evolution  of  the  MLE  of  aQ  as  the  number  of  samples  increases.  Each 

group  was  computed  at  a different  noise  level.  (In  both  group  (a)  and 

group  (b) , o2  * 0.01,  but  in  group  (a) , x = 2/3,  and  in  group  (b) , 

00 

* 4.)  Once  again,  the  data  lead  to  the  conclusion  that  low  and 
moderate  noise  levels  present  no  difficulty  for  the  ML  estimation 
schemes. 

Figures  5-1 j,  5-2 j , 5-3 j , and  5-4 j give  insight  into  the  distribu- 
tion and  consistency  of  a.  As  expected  from  theory,  the  MLE  under  the 
conditions  for  Figures  5-lj , 5-2j , and  5-3j  has  all  appearances  of 
being  consistent  and  gaussian  in  the  limit.  On  the  other  hand,  con- 
sistency of  a in  Figure  5-4 j is  not  obvious  although  considerable 
convergence  occurred  in  going  from  3 to  10  samples.  The  problem  here 
is  that  the  model  for  this  figure  is  autonomous.  Although  improvement 
in  the  estimate  can  be  expected  as  the  number  of  samples  grows,  con- 
latency  cannot  be  shown  for  the  autonomous  case.  In  fact,  in  any 
practical  situation,  improved  estimates  will  probably  sooner  or  later 
be  supplanted  by  degrading  estimates  as  the  signal  content  in  the 
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measurements  tends  to  fall  below  the  word  length  of  the  computer 
providing  the  MLE. 

5.3.2  MULTIDIMENSIONAL  MODEL 

The  figures  for  the  multidimensional  model  give  some  indication 

of  the  behavior  of  the  derivative  functions  for  x _ known  and  the 

o 

autonomous  version  of  xQ  as  an  unknown  parameter.  The  cost  functions 
in  both  cases.  Figures  5-5a  and  5-6a,  are  relatively  smooth  and 
symmetric  - at  least  along  a line  in  2 x 2 A-space  passing  through  AQ. 
The  curves  for  the  norm  of  the  derivative.  Figures  5-5b  and  5-6b, 
though  reasonably  smooth  in  the  range  shown,  take  a sharp  plunge  to 
their  minima. 

The  curves  for  the  directional  derivatives.  Figure  5-5c  and  5-6c 
behave  less  dramatically  than  those  for  the  norm  of  the  derivative  and 
have  the  more  familiar  form  of  a scalar  parameter  quadratic  cost  func- 
tion derivative.  Figure  5-5c  corresponds  directly  to  the  cost  function 
of  Figure  5-5a,  but  Figure  5-6c  corresponds  only  in  an  approximate 
sense  because  Figure  5-6a  is  based  on  the  true  xq  while  Figure  5-6c 
is  based  on  StQ.  Note  that  while  the  directional  derivative  curves  pass 
through  zero,  the  normed  derivative  curves  do  not,  because  there  in 
general  is  non-zero  gradient  vector  orthogonal  to  the  Aq  direction  when 
the  directional  derivative  passes  through  zero. 

In  all  four  derivative  figures,  "c"  is  obvious  and  close  to  c = 1 
(or  "c0") . However,  c is  unique  for  the  range  shown  in  the  directional 
derivative  figures,  but  there  are  local  minima  in  normed  derivative 
figures.  Thus,  for  numerical  evaluation  of  A the  combination  of 
moving  the  solution  along  directional  derivative  type  curves  but 


measuring  convergence  by  some  norm  of  the  derivative'  function  appears 
to  have  merit. 


5.4  CALCULATION  OF  THE  MLE  AND  ITS  APPROXIMATIONS 

Expressing  the  derivative  functions  in  the  form  of  polynomials 
does  have  the  advantages  that  if  the  coefficients  are  bounded,  the 
functions  will  be  bounded  and  will  possess  all  derivatives.  Further- 
more, a wealth  of  literature  exists  on  the  properties,  analysis,  and 
solution  of  (scalar)  polynomial  equations.  Also  well  known  is  the  fact 
that  the  roots  of  a polynomial  may  be  difficult  to  locate  numerically. 

5.4.1  SOLUTION  OF  THE  LIKELIHOOD  EQUATION 


The  roots  of  a polynomial  may  be  well  defined  mathematically,  but 
a useful  definition  for  numerical  work  is  not  as  clear.  The  most 
consnon  definition  and  the  one  used  in  the  simulation  studies  is  that 
any  value  a is  a root  of  the  likelihood  equation  D,,(a)  = 0 if 
\Dfl(<*)  I < e » e > 0. 

Having  established  a definition  for  a root  of  the  polynomial,  the 
next  step  is  to  select  some  technique  to  find  the  root.  The  classical 
numerical  technique  for  solution  of  likelihood  equations  is  Fisher’s 
'scoring  for  parameters',  the  'score'  being  the  derivative  function 
evaluated  at  the  latest  estimate  of  the  root  (Rao  [1965,  p.  302]). 

The  process  is  basically  a Newton  type  iteration  and  has  the  following 
form  for  a sample  size  of  n and  parameter  0s 


9i  + 


9 logL 
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/{nKQ^  } 


where  the  information  I( 9)  is  defined  as 


(5.7) 
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(5.8) 


The  basis  for  this  method  is  that  generally  in  the  classical  considera- 
tions of  maximun  likelihood  estimation 


i a.s. 

n 96* 


- 1(6) 


(5.9) 


The  numerical  analysis  literature  offers  a variety  of  methods  to 
determine  roots  of  polynomials,  e.g.,  see  Busk  and  Svejgaard  [1962], 
many  of  which  could  be  more  desirable  in  specific  situations,  if  not 
in  most  situations  as  contended  by  seme,  than  scoring  for  parameters. 

The  methods  can  be  considered  as  members  of  one  of  two  groups,  direct 
or  iterative.  The  direct  methods  are  recursive  and  make  no  use  of  any 
initial  estimate  of  the  roots.  Generally,  for  reasonably  behaved 
functions  for  which  there  is  at  least  some  rough  information  on  root 
locations,  the  iterative  methods  are  more  effective.  Barnett  [1966] 
compares  Newton-Raphson  (method  of  tangents) , fixed  derivative  Newton, 
scoring  for  parameters,  and  regula  falsi  (method  of  chords)  methods 
for  the  solution  of  likelihood  equations  with  multiple  roots.  He 
concludes  that  regula  falsi  is  most  easily  controlled  and  most  reliable 
for  seeking  out  the  desired  roots  and  in  addition  locates  only  roots 
which  correspond  to  maxima  or  minima,  as  the  case  may  be.  Jennrich  and 
Sampson  [1968]  review  steepest  descent,  Newton-Raphson,  and  Gauss-Newton 
as  applied  to  non-linear  least  squares  estimation.  They  state  that  of 
the  three,  Gauss-Newton , an  iteration  method  wnich  applies  standard 
linear  regression  to  a linearized  version  of  the  non-linear  least 
squares  problem,  is  most  popular  because  it  specifies  the  step  size  for 


the  iteration  and  does  not  require  second  derivatives.  (For  the  studies 


143 


reported  in  this  chapter,  regula  falsi  appeared  most  appropriate  and 
was  the  only  technique  used.) 

One  other  aspect  of  the  numerical  solutions  should  at  least  be 
mentioned . The  theory  on  convergence  of  most  standard  numerical  root 
solving  methods  is  reasonably  well  developed.  However,  since  the  roots 
of  the  likelihood  equation  and  the  sequence  of  approximations  resulting 
from  the  process  of  numerical  solution  of  the  roots  are  random  variables, 
the  usual  convergence  criteria  must  be  reconsidered  from  a statistical 
point  of  view.  Large  sample  (stochastic)  convergence  for  Newtcn- 
Raphson  and  scoring  for  parameters  is  shown  by  Kale  [1961]  for  solution 
of  the  (classical)  scalar  likelihood  equation  and  by  Kale  [1962]  for 
the  (classical)  multiparameter  likelihood  equation.  Jennrich  [1969] 
shews  large  sample  convergence  of  the  Gauss-Newton  method  applied  to 
non-linear  least  squares. 

5.4.2  APPROXIMATIONS  TO  THE  MLE 

The  maximum  likelihood  estimators  developed  in  Chapter  T grow  in 
total  number  of  terms  as  the  number  of  samples  grows  and  require  the 
entire  measurement  sequence  and  input  sequence  to  be  stored.  These 
characteristics  could  preclude  real  time  application  of  the  estimators, 
particularly  if  a completely  updated  MLE  is  demanded  after  each  new 
sample.  Obviously,  given  enough  samples,  any  computer  would  eventually 
become  clogged  to  the  point  where  further  evaluations  of  the  estimate, 
real  time  or  not,  would  be  impractical. 

The  iterative  root-finding  numerical  methods  typically  require 
evaluation  of  the  likelihood  equation  (and,  in  some  cases,  also  the 
derivative  of  the  likelihood  equation)  for  each  iteration.  The  least 
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expensive  evaluation  of  a general  polynomial  is  given  by  Horner's 


method  (Lyusternick , et  al.  [1965,  p.  10]).  For  an  nth  degree  poly 


namial,  n additions  and  n multiplications  are  required 


Clearly,  the  computational  constraints  dictated  by  many  practical 


situations  can  be  met  only  by  approximating  the  maximum  likelihood 


estimate.  This  problem  and  some  possible  responses  have  been  given 


limited  discussion  in  Chapter  4.  The  average  coefficient  method  was 


proposed  as  an  analytical  approximation  to  the  MLE . The  equations 


for  the  scalar  parameter  estimates  are  repeated  below.  Equation  (5.10) 


(see  Equation  (4.84))  is  the  two-sample  approximation  for  the  x known 


x unknown  parameter,  and  x unknown  randcm  variable  cases,  and  Equation 


which  is  more  numerical  in  nature  than  the 


average  coefficient  method,  that  also  appears  to  have  merit  is  a form 


of  curve  fitting  which  exploits  a recursive  aspect  of  the  likelihood 


equation.  Through  curve  fitting,  the  MLE  can  be  in  theory  approximated 
to  any  degree  desired  whereas  the  extent  to  which  the  average  coeffi- 


cient estimate  approximates  the  maximum  likelihood  estimate  is  relatively 


unclear.  The  method  consists  of  initially  selecting  several  specific 

a 

values  of  the  parameter  variable  a such  that  the  area  in  which  a is 
expected  to  lie  is  spanned  and,  after  each  new  sample,  recursively 
evaluating  the  derivative  function  at  these  points . The  appropriate 
root  of  a curve  fitted  through  the  updated  points  on  the  derivative 
function  is  taken  as  the  approximation  to  a. 

This  recursive  curve  fitting  method  results  in  a substantial 
reduction  of  computations  and  storage  - a reduction  by  a factor  of 
approximately  N,  the  total  number  of  samples  at  the  time  of  computation. 
There  are  seme  problems  associated  with  this  method  which  tend  to  offset 
its  computational  advantage.  The  total  number  of  points  which  must 
be  computed  depends  on  the  size  of  the  region  in  which  a is  expected  to 
lie  and  the  precision  to  which  a must  be  known.  As  a begins  to  stabilize 
after  several  samples , moving  the  points  to  improve  the  approximation 
would  be  desirable.  There  appears  to  be  no  way  to  move  the  points 
without  making  some  approximations.  Also,  curve  fitting  can  introduce 
extraneous  roots  or,  conversely,  could  result  in  a curve  which  has  no 
real  roots  in  the  region  of  a. 

5.4.3  SIMULATION  OF  THE  MLE  APPROXIMATIONS 

To  provide  some  indication  of  performance,  simulations  of  the 
two-sample  average  coefficient  method  and  a 3-point  recursive  parabolic 
fit  to  the  derivative  functions  were  made.  (The  fit  was  arranged  with 
aQ  midway  between  the  second  and  third  points  to  give  a worst  case 
when  aQ  is  spanned  by  the  points.)  For  comparison,  the  results  are 
presented  along  with  the  MLE  and  a least  squares  estimate,  derived  as 
follows.  From  Equation  (3.50): 
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(5.14) 


or. 
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(5.15) 

(5.16) 


The  curves  of  group  (a)  (Figures  5-la,  5-2a,  5-3a,  5-4a) , group  (b) 
(Figure  5-lb) , and  group  (c)  (Figure  5-lc)  display  the  evolution  of  a 
for  maximxxn  likelihood  (MLE) , 2-sample  average  coefficient  (AVC) , 3- 
point  recursive  fit  (3PT) , and  least  squares  (LSQ) . The  results  indi- 
cate that  for  low  noise  levels  (group  (b) ) , both  least  squares  and 
average  coefficient  do  well  compared  to  MLE  for  all  but  the  autonomous 
differencing  approach.  As  the  noise  level  increases  (group  (a)  and 
group  (c) , respectively)  the  amount  by  which  they  are  in  error  increases 
substantially  relative  to  the  MLE,  though  average  coefficient  performed 
noticeably  better  than  least  squares.  (Since  theory  in  Chapter  4 
indicated  consistency  for  the  average  coefficient  method  under  the 
conditions  of  the  simulation,  it  should  have  performed  reasonably  well.) 
On  the  other  hand,  for  the  autonomous  differencing  approach,  Figure 
5-4a,  for  example,  least  squares  and  average  coefficient  drift  toward 
zero.  (This  drift  of  the  average  coefficient  a was  expected  because 
the  limiting  root  was  shown  to  be  zero.) 
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The  3-point  recursive  fit  follows  the  MLE  in  group  (a)  and  (c) 
but  drifts  away  in  group  (b) . The  drift  in  group  (b)  is  most  likely 
due  to  the  peak  in  the  derivative  curves  near  aQ  for  group  (b)  (see 
Figure  5-2f ) . 

In  the  description  of  group  (b) , the  effect  of  increasing  the 
separation  of  the  points  from  0.1  to  0.2  was  mentioned.  The  error 
increased  from  a few  per  cent  for  0.1  separation  (as  observed  from  the 
curves)  to  about  10%  at  30  samples  for  0.2.  Thus,  for  3 points 
separated  by  0.1  and  a priori  knowledge  of  aQ  to  within  ±0.1,  the 
figures  show  that  the  a approximation  is  good  to  nearly  two  significant 
figures  whereas  for  the  0.2  situation  barely  more  than  one  significant 
figure  is  obtained. 

5.5  COMPUTATIONAL  ERRORS 

The  numerical  error  resulting  from  the  finite  word  length  of 
digital  computers  can  have  a substantial  effect  on  the  reliability 
of  computations.  The  problem  can  arise  at  either  end,  so  to  speak, 
roundoff  or  cancellation.  (For  a double  precision  accunulator  and 
single  precision  storage  with  floating  point  operations  these  events 
are  mutually  exclusive  for  any  single  operation.)  The  problem  of 
differencing  nearly  equal  quantities  (cancellation)  was  observed  to 
occur  during  evaluation  of  the  derivative  functions  at  some  distance 
from  *o  or  AQ,  but  generally  only  when  |a|  > 2 in  the  scalar  case.  Also 
for  |a|  > 2,  exponential  overflow  can  easily  occur.  Neither  of  these 
problems  is  of  much  interest  here  however  since  most  of  the  computational 
effort  would  be  confined  to  the  neighborhood  of  the  root  a. 

The  problem  of  roundoff  is  discussed  in  detail  by  Wilkinson  [1963] . 
For  double  precision  addition,  normalization,  and  rounding  to  single 
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precision  for  storage,  he  shows  floating  point  addition  *1+x2 
(x 2+x 2) ( 1+e }) , |ej|  < 81_t.  (B  is  the  base  for  the  computer's  arith- 
metic and  t is  the  number  of  digits  in  the  single  precision  mantissa.) 


Similarly,  the  floating  point  multiplication  *j*2  yields  XjX 2(l+z2) 


< 8 


1-t 


Following  a development  of  Adams  [1967] , an  estimate  of  the  round- 
off error  in  polynomial  evaluation  can  be  made.  Let  a - | I 
n m |e2|.  Consider  the  polynomial 

i(»)  - c + c.a  +...+  caP 
o i n 

Horner ' s recurrence  for  computing  l (x)  is 


bo  " cn 


^ f k * <2  f • • • /Ji 

where 


(5.17) 


l(x) 


The  roundoff  accumulation  on  the  /cth  step  in  the  evaluation  of  the 
polynomial  can  be  described  as 

+ ek>  = ( I bk~l I + ek)(l+*)  + |cn_*|7  ( 1 + o)  (5.18) 

Expanding  (5.18) , rearranging,  and  dropping  higher  order  terms  gives 
the  following  error  recursion, 

sk  m Ma*-1  * Mfcl  (5.19) 

where 

6. 


ir+o 


and 


TT+O 


Let  lQ  (x)  be  the  true  value  of  the  polynomial  and  £ (x)  be  the  computed 
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value . Then 


| (x)  -l  (x)  | < fn+oJ  <5n- 1 2>n  | * 

(Note  that  the  coefficients  c ^ ■ c^(yQ, . . . ,yN,  u£J,.../un_j^  must  also 
be  computed  and  consequently  are  in  error) . 

Wilkinson  demonstrates  that  root  locations  can  be  extremely 
sensitive  to  coefficient  perturbations.  As  a measure  of  the  sensitivity 
of  the  roots  he  develops  what  he  calls  the  'condition  of  a polynomial'. 
Briefly,  Wilkinson  proceeds  as  follows  to  arrive  at  the  condition. 

Let  a be  a root  of  1(a).  Consider  the  zero  of  l (a)  +zg  (a)  where 

g(a)  = g0  + gza  + ...+  gnan  (5.20) 

By  the  theory  on  series  reversion,  the  change  in  root  location  can  be 
bounded  for  sufficiently  small  e as 

a (c)  - a + 


_g_(&)_  < 
V (a) 


(5.21) 


where 


or, 


V = JL  1(a) 
da 


k(z)  - a * e-2i|L  , e + 0 (5.22) 

)L  (a) 

As  expected,  the  crucial  factor  in  root  sensitivity  is  the  slope  at  the 
root.  In  the  earlier  discussions,  the  fact  that  the  slope  of  the  deriv- 
ative functions  at  3 increases  as  the  number  of  samples  increases  was 
pointed  out. 
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SECTION  VI 


SUMMARY  OP  RESULTS  AND  CONCLUSIONS 

Likelihood  equations  expressing  the  MLE  for  each  of  four  initial 
condition  assunptions  were  derived.  The  finite  sample  and  large  sample 
characteristics  of  the  estimators  were  examined.  Computationally 
efficient  approximations  to  the  MLE  were  proposed  and  investigated. 

The  finite  sample  and  large  sample  characteristics  as  well  as  the 
approximations  are  discussed  for  models  without  plant  noise  whereas 
likelihood  equations  are  presented  for  both  models  with  and  without 
plant  noise. 

The  likelihood  equations  based  on  the  scalar  model  can  be  expressed 
as  polynomials  in  the  unknown  parameter  a for  each  of  the  four  initial 
condition  situations  considered.  For  the  vector-matrix  model  (without 
plant  noise)  the  likelihood  equations  for  initial  condition  xQ  known 
and  xQ  unknown  parameter  are  again  polynomials  but  now  are  polynomials 
in  the  matrix  A with  matrix  coefficients.  The  character  of  the  poly- 
nomials varies  considerably  with  initial  condition  assumptions . One 
particularly  interesting  observation  in  this  regard  is  that  the  MLE 
without  plant  noise  when  xQ  is  known  or  is  an  unknown  parameter  does 
not  explicitly  depend  on  the  variance  of  the  measurement  noise.  Also, 
the  degree  and  complexity  of  the  polynomials  increase  with  the  number 
of  samples  upon  which  the  estimate  is  based.  This  expansion  appears  to 
be  unavoidable  because  by  the  work  of  Dynkin  no  sufficient  statistic 
other  them  the  trivial  one  (all  the  samples)  exists  when  xQ  is  known 
or  an  unknown  parameter.  The  same  conclusion  is  expected  to  hold  for 
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the  other  two  initial  condition  cases  since  in  those  situations  the 
samples  are  neither  independent  nor  identically  distributed. 

For  finite  samples  the  scalar  MLE  in  all  four  cases  approaches  the 
true  parameter  value  as  the  noise  goes  to  zero.  A similar  but  stronger 
result  is  given  by  Theorem  4.1  which  says  that  on  the  average  the  true 
parameter  value  is  a root  of  the  likelihood  equation.  The  converse  of 
that,  the  average  of  the  roots  of  the  likelihood  equation  corresponds 
to  the  true  value  of  the  parameter,  was  not  shown  nor  is  it  clear  that 

such  is  the  case.  Also,  in  the  limit  as  the  measurement  noise  goes  to 

zero,  the  MLE  for  stable  scalar  autonomous  models  is  the  only  stable 
root  of  the  likelihood  equations  and,  as  already  mentioned,  is  equal 
to  the  true  parameter  value.  The  simulations  indicate  that  this  con- 
clusion may  be  nearly  true  even  for  non-zero  forcing  functions  and  up 

to  moderate  noise  levels  if  the  true  parameter  value  is  stable  and  not 

in  the  neighborhood  of  ±1,  but  extraneous  roots  do  occur  outside  of 

(-1,1). 

Concern  for  possible  problems  that  could  be  encountered  in  the 
numerical  solution  of  the  polynomial  likelihood  equations  is  lessened 
by  evidence  from  the  simulations  that  for  stable  models  the  derivatives 
of  the  likelihood  functions  in  the  interval  (-1,1)  are  relatively 
smooth  and  rather  insensitive  to  perturbations  from  noise  up  to  moderate 
levels.  A development  by  Wilkinson  shows  that  an  important  factor  in 
the  sensitivity  of  roots  of  polynomials  with  respect  to  perturbations 
in  their  coefficients  is  the  magnitude  of  the  slope  of  the  polynomial 
at  the  roots.  In  the  simulation  results,  the  slope  at  the  root 
corresponding  to  the  MLE  was  observed  to  increase  as  the  number  of 
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samples  increased.  The  shape  of  the  derivatives  of  the  likelihood 
functions  varies  considerably  with  true  parameter  value  and  can  have 
peaks  in  the  neighborhood  of  the  roots.  In  the  light  of  this  and 
discussions  by  Barnett,  regula  falsi  makes  a good  choice  for  a numerical 
method  to  compute  the  roots  of  the  scalar  likelihood  equations. 

The  MLE  settles  down  after  the  first  few  samples  in  the  simulations. 
Any  improvement  after  that  comes  about  rather  slowly  if  at  all.  When 
xQ  is  an  unknown  random  variable,  even  a relatively  large  difference 
between  the  actual  initial  condition  and  the  mean  initial  condition 
appears  to  have  no  significant  effect  except  for  a transient  during  the 
first  couple  of  estimates.  The  ML  estimators  generally  performed  well 
on  an  absolute  scale  as  well  as  relative  to  least  squares  and  any 
approximate  ML  estimators.  Though  the  form  of  the  estimator  depends 
on  the  initial  condition  assumption,  they  appear  to  perform  similarly 
for  the  same  set  of  measurements. 

The  cost  and  derivative  functions  for  matrix  parameters  appear 
relatively  smooth  over  the  region  of  interest  in  the  situations  which 
were  simulated.  Along  a vector  through  the  true  matrix  in  n2  A -matrix 
space,  the  roots  (or  minima,  as  the  case  may  be)  of  the  derivative 
functions  occurred  in  the  neighborhood  of  the  true  value  of  A when  the 
noise  level  was  relatively  low. 

The  only  large  sample  property  investigated  was  consistency.  The 

MLE  when  x^  is  known  was  shown  to  be  consistent  for  stable  scalar 
o 

models.  The  same  arguments  for  consistency  appear  to  hold  also  when  xQ 
is  an  unknown  parameter  and  when  xQ  is  an  unknown  random  variable. 

Because  of  lack  of  uniqueness  in  the  limit,  consistency  for  autonomous 
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models  was  not  shown.  Though  improvements  in  the  MLE  cam  be  expected 
initially  for  stable  autonomous  systems , continued  improvement  that 
may  well  be  theoretically  possible  for  increasing  but  still  finite 
numbers  of  samples  cannot  be  expected  to  occur  because  of  the  finite 
word  length  of  any  processing  computer.  The  Monte  Carlo  simulation 
results  tend  to  substantiate  the  above  conclusions. 

To  overcome  the  growth  in  complexity  of  the  MLE  computed  with 
increasing  nuobers  of  samples,  two  approximations  to  the  ML  estimator 
are  proposed  — average  coefficient  and  recursive  curve  fitting.  The 
average  coefficient  approximation  can  be  based  on  any  length  string  of 
samples  but  only  the  two-sample  form  was  considered.  This  approximation 
scheme  applies  in  both  scalar  and  matrix  situations,  but  only  to  the 
differencing  approach  and  xQ  unknown  parameter  case.  Since  there  is  no 
simple  way  to  account  for  the  initial  condition  information  in  the 
cases  where  xq  is  known  or  xQ  is  an  unknown  random  variable,  the  unknown 
parameter  approximation  was  assuned  to  serve  for  these  cases  also. 

Both  the  two-sample  approximations  are  related  to  other  approximations 
and  estimators  in  the  literature. 

The  estimates  in  both  average  coefficient  approximations  approach 
the  true  parameter  value  as  the  noise  goes  to  zero,  and  the  expectation 
of  their  noise  terms  is  zero.  The  one  for  xQ  unknown  parameter  always 
has  two  real  roots,  one  in  [-1,1]  and  the  second  in  (-1,1) C,  and  gives 
a consistent  estimate  if  the  input  to  the  system  is  a constant.  In  the 
one  for  the  differencing  approach,  the  root  locations  are  roughly 
similar  if  the  noise  level  is  low.  For  this  case  the  large  sample  root 
is  zero  if  the  model  is  autonomous.  The  performance  of  these  methods 
in  the  simuhtions  lies  between  least  quares  and  ML. 
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The  curve  fitting  method  can  be  based  on  any  number  of  points  but 
the  greater  the  number  of  points  the  less  the  computational  advantage 
becomes.  (The  same  is  true  for  string  length  in  the  average  coefficient 
methods.)  For  three  points  separated  by  0.1  which  span  the  true  param- 
eter value,  the  simulation  of  this  approximation  yielded  an  estimate 
good  to  nearly  two  significant  figures.  This  approximation  performs 
well  if  the  derivative  of  the  likelihood  function  is  smooth  near  the 
desired  root,  but  as  presented  can  be  used  only  with  scalar  likelihood 
equations . 

A number  of  questions  remain  only  partially  answered.  Evidence  on 
the  usefulness  of  initial  condition  information  was  not  very  conclusive. 
The  numerical  aspects  of  the  solution  of  the  matrix  likelihood  equations 
as  well  as  the  properties  of  the  solutions  were  only  touched  upon. 

Finite  sample  root  distributions  were  not  firmly  established. 

There  are  many  possible  extensions  to  this  study  including  increas- 
ing the  unknown  parameters  to  include  the  H and  B matrices  or  parameters 
in  the  noise  distribution,  input  measurement  noise,  time  varying 
coefficients,  and  A matrices  some  of  whose  elements  are  known.  As  a 
special  case  of  the  latter,  further  studies  of  the  companion  matrix 
form  of  A should  be  considered  since  the  multiparameter  identification 
problem  is  often  posed  in  this  form. 
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DETERMINANT  AND  INVERSE  OF  MEASUREMENT  COVARIANCE  - xQ  RANDOM  VARIABLE 


Let  R ■ al  + 3§aT 
where : 


A. I 


aT  * (l,a,... ,aN  ) 

I = identity  matrix 

a,  3,  and  a are  real  numbers 


Theorem  A.l;  Let  R be  the  (N+l)  x (N+l)  matrix  defined  by  (A.l) . Then 
its  determinant  can  be  expressed  as 


N 


|j?|  - aN (a  + 3 l a2*) 
i=*0 


A. 2 


Proof : 

Since  the  rank  of  a is  one,  the  rank  A,  where  A & §aT , 
cannot  be  greater  than  one.  Clearly,  the  rank  of  A is  one. 

Then  if 

3 = 0,  |i?|  = aN , and  if  a = 0,  |r|  = 0. 

Assume  a and  6 not  zero,  a is  a non-trivial  eigenvector 
of  A , x.e. , 

Aa  = ("a  Ta,)  a_  A.  3 

Because  A is  symmetric,  there  exists  an  orthogonal  matrix  M 
such  that  A = M A Mt  where  A = diag  (a^a^,0 , . . . ,0)  . Then 

|M(aT  + 3A )Mr)  | = aN  (a  + 3aTaj  A. 4 | 

Theorem  A. 2;  Let  R be  the  (N+l)  x (N+l)  matrix  defined  in  (A.l).  Then 
its  inverse  can  be  expressed  as 

R-1  = -[l aa»]  A.  5 

oL  a+3  aTa  J 
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Proof : 


M - 

I 

! 

Jl  1 

/5T  i 
• 

N* 

i 

i 

4 

Then, 


R"1  - a-1 1 + (v/A;ynt 


where, 


f - 


or, 


-1  - a-»I  ♦ 


- a_1 11  - dbi  «1 
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A.b 


It  a ■ 0,  R-1  does  not  exist,  end  if  8 • 0,  R"1  » a"lX. 
Assume  a and  8 not  zero.  Then  by  (A. 3)  and  the  symmetry  of  A, 
t here  exists  an  orthogonal  matrix  N such  that 
MTM  - ol  + 6 diag  (\,0 , ,0) 
where 

X - ara 

The  inverse  of  (A. 6)  is 

RtR_1M  - diag/<'a+8X;~l,arl,...,a-1; 

■ a_1X  + diag (v ,0 , . . . ,0) 

where 

v ■ -8Xa_1/fa+6X^ 

Since  MMr  ■ MrM  • X , M must  have  the  form 


A. 7 


A. 8 


A. 9 


A.  10  | 
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APPENDIX  B 


DETERMINANT  AND  INVERSE  OF  TRI-DIAGONAL  TOEPLITZ  MATRIX 


L«t  Tn  be  an  N * N tri-diagonal  Toeplitz  matrix  where 

TN-  <>  I * B I0 

kis 

i\\  0 

I m ^ N\\ 

O \ \ 

\'v  * 

/I  ' , V 


and  where  I is  the  identity  matrix.  The  coefficients  a and  B are  real. 


B.l  THE  DETERMINANT  OF  TN 

The  eigenvalues  of  l0  are  the  roots  of  hN(\)  where 

An  - detCXr  - I0)  B.3 

Based  on  Grenander  and  Szego  [1958,  §5.3],  of  (B.3)  can  be  expressed 
as  a recursion  from  which  the  roots  can  be  found. 


AW  " XAW-1  _ AN-2 


, N ■ J,4, • . • 


where 


Aj  a;  m 1 
A^a;  - x2  - 1 

Solving  the  difference  equation: 

z2  - Xz  + 1 - 0 B. 5 

or,  z1>2  - iaiA2-4  ; 8.6 

or,  Aw  ■ CjZj^  + c2z2w  B. 7 

The  coefficients  Cj  and  c0  are  difficult  to  evaluate  when  Aw  is 
expressed  in  the  form  of  (B.7) . The  following  change  of  variable  helps 
overcome  this.  Take  a such  that, 


Then  the  roots  in  (B.6)  become. 


*1#*2  - a, a 


The  solution  to  (B.4)  in  terms  of  a is 


Aw  - CxaN  + Cza~N 


The  coefficients  are  found  from  the  initial  conditions: 


Aj  - (l+a2)/a  = Cja  + Cz/a 


A2  - l(l+2a2+a'')/a2]  - 1 


(l+a2+al* ) /a2  « Cxa*  + C^/a1 


Cx  = -a2/ (1-a2) 


C2  = 1/ (1-a2) 


(B.12) 


From  (B. 10) , 


l-a2N+ 2 
w (l-a2)aN 


The  roots  of  A^  in  terms  of  a are  roots  of  unity.  Noting  that 


a B 11  is  not  a root  of  Aw,  the  roots  are 


.k  - 


k = 1, ,N ,N+2 ,. . . ,2N+1  B. 14 


or  in  terms  of  X , 


X,  = e 

k 


2 cos 


k ~ 1,...,N,N+2,...,2N+1  B. 15 


Since  TN  is  N x N,  there  are  N eigenvalues.  In  (B.15) , roots  for 


*V+2  < k < 2N+1  duplicate  those  for  1 < k < N . Therefore,  the  rignn- 


values  of  (B.2)  are 


, - vk 

X,  =2  cos  — - 
* W+l 


k — 1,2,... ,W 


j 


MMIKHIMImM 


Because  JQ  is  real  and  symmetric,  there  exists  a similarity  transforma- 
tion to  diagonalize  IQ.  Thus, 


N 


|rj  • TT  (0  + 26  cos  “Jr; 
k-1 


B.  17 


As  a special  case,  let  o - a2(l+r2)  and  0 - -o2r.  Then  from  B.13 
taking  r - -a, 

„ ..  l-r™+2  

B.  18 


1*1  - (-*2r;"  ^ 

N (1-r2)  (~r)N 


- (a2)N (l+r2+. . ,+r2N) 


(The  result  in  (B.18)  could  have  been  obtained  somewhat  more  directly 
by  factoring  TN  into  the  form  G^G^r .) 


B.2  INVERSE  OF  T, 


N 


Let  Tn  *»  and  T^-1  - . Since  TN  is  symmetric, 

need  be  determined  only  for  j > i . 

The  cofactor  of  t^,  j i i , has  the  form: 


’ Ti-l!  *i  J 

b2 

r 

! Dj-i 

0 J 1 

i 1 

: 0 ! 

1 \ 

TN-j 

where  Bj-j  is  a (j-i)  * (j-i)  upper  triangular  matrix  with  0 along  the 
diagonal  and  Tg  is  of  the  form  (B.l) , Than 

Hi  * 

- f-IUJ'1|rJ_J|  |rff_j|/|rw|  i>i  B.19 

where 


To  - 1 


As  a special  case,  let  a - o2 (1+r2)  and  8 - -a2r.  Then  for  j > i, 


, i-1  N-j  N 

t ij-jil  I l r UKt).}-!,,,  J tlv, 

sm0  t~0  v»0 


B.  20 
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APPENDIX  C 

THE  LIKELIHOOD  EQUATION  FOR  THE  SCALAR  AUTONOMOUS  DIFFERENCING  APPROACH 

From  Equation  (3.64) , the  likelihood  function  for  the  scalar 
autonomous  differencing  approach  after  N+l  samples  is 


Lg  - (2ir)'^|R|"/2exp  [--  y*vR~ly*] 


C.l 


a - fa2|j?,  I;"1 

afw  - ‘°2>“£M 

37  R'‘  " 57  **''  * « 37  B.‘‘ 


where 


37“  --^Kl2''1  57,Rl 


Prom  13.68) 


where 


i-1  N-j 

r,-‘-  I I .2 «•«**<./-/  ,jii 

J *-0  s-0 


-4|*,|  - 2 I p.21-1 

d*  p-0 

-1  - ^ [2  (s+k)+j-i]a2 

*-0  s-0 

Multiplying  through  (C.7)  by  o2|j?|2/fa2Jf*  gives: 
VH  + QMm  0 
where 

*v,  - -*2Msl*,l 

<?*“  + 1*1  Is  W 


♦ 2(yN°)r(Ri  (P£l)  - + «l*jl  df 

- + al*lls**1;y"° 

aa  °a  (C.  20) 

The  reduced  form  of  (C.20)  is  relatively  simple.  At  this  time 
no  simple  way  has  been  uncovered  that  yields  (3.73)  from  (C.20).  An 
inductive  proof  will  be  given  instead  after  the  following  lengthy 
Lemma  is  established. 


I 


C.  21 


Lemma  Let  Vi"  Q„  - 


Then, 


N-l 


*n-i  m 2 l (N~k)a  y» 

k-0 


N-l  N 

+ 2 l l (N+i-2p) 
i*0  p=0 


N*  i * lf-i 

a yj  % 


N-l  N-l 

’ll  <2N-j-r)a 
r=0  j=0 


2H*j+t  -i 


yeyj 


Proof: 


From  (C.20)  and  the  preceding  definitions: 
N 


-J  . . r 1r1  Nr1  . 

a )y  i 1 

p*=0  i=2  k~0  s=0 


r 1#- J V.  "r~  » 

QM  - ( l *P*'  >t  l ( l l - 


N N-l  N i-1  N-j 

*2,\  [ ! (J  J 


p=Q 
N 

l 

p=0 

N 


i-1  j-i+1  k=0  s=0 
N i -2  N-i 

i a i 

i-1  Jc-0  s— 0 


-fZ  a2'; a rl  I ms+JOla^'-Wl 


„ N-l  N i-1  N-j  . 

- 2C  I a *)[  l l ( l l [2(s+k)+j-i]a2(,*k,*3’1‘ 

p=0  i-2  j=i+l  Jc-0  s-0 


N 


><■  I .*'»  I I rT  I 


N N i-2  tf-j 


*<*•**  >*/-i 


p-o 

N 

I 

p-0 

n 


i-1  j-i  *-0  s-0 
N-2  N i-2  N-j 

a i a i 

i-2  j=i+2  k**0  s=0 


Jy*Vl 


*2,1  i ,'f  "r 


, , w w i-I  M-j  ,,  . . 

-2a,  l 2P.1'-1, 1 l hi  l 

p=0  i=l  j=i  k=0  s=0 


N 


, . N-2  N i-1  N-j  . . 

2a(  l 2pa‘*-1,,  l III  l , 

p-0  i-2  j-i+2  A— 0 s-0 


C.  22 


%iyj7 
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N N N i-1  N-j 

+ 2a<l  a*')tl  l l l [2(a+k)+j-i]a*<*~k)*i'i'x  )yj.g  yj] 
p-0  i-1  j-i  k-0  s-0 


N N-l  N i-1  N-j 

+ 2a  ( l aZ*)[ 


P-0 


l l l ( I I I2(s+k)+j 
i-i  jJi+i  k-o  s-o 


-Ua  iyjyj.i) 


- 2.1  l ,‘')t  1(1  l .‘'-“>2,-/1 

p-0  i-1  k-0  s-0 


N „ N-l  N i-1  N-j  ..  , . 

-*•'  * r r ;y<|.  y j 

m 0 


N N-J.  N J N-j 

- 4.<  l J J i l l 

p-0  i-1  j-1+1  k-0  s-C 


2-  p Zp-i.  . r p p Z(9*k)  . £ , 

+ a‘(  l 2pa  * )[  l ( l la  )y  ,.j  ] 

p-0  i-1  k-0  s-0 


N 


N-l  N i-1  N-j  . 

2a2(l  2pa*'-1)rl  l ( l l 

p-0  i-1  j-i+l  k-0  s-0 


N N i-1  N-i 

- a*(  l a**)ll  l l [2(s+k)]a 

p-0  i-1  k-0  s-0 


>2  i.,2,.,1 
‘"•"">1,-/1 


N 


. « % N-l  N i-1  N-j  ..  ... 

2.‘<l  .‘'III  l I l l l2t.>k>*l-il.‘,“k,-’-*lli.lliJ 

p-0  i-1  j-i+1  k-0  s-0 


(C. 23) 


Forming  QN  - C^_j  and  accounting  for  the  fact  that  the 
inverses  (C.14)  hold  only  for  j > i gives: 

Qn-1 

ZN-1,  r?  /r1  Nr1  *<•**)  . 2, 

- 2Na  [ I ( l la  )y  { 1 

i-1  k-0  s-0 

N-l  , . N i-1  _ , , , 

+ f I J ( l a,,M4*,j»ii] 

p-0  i-1  A-0 


♦ -.""a  i f z j * 

i-1  j-i+1  *-0  s-0 


N-l  N i-1  N-j 
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I 


*2(*l  *»*»*,  ”l  l *1  >MtJ 

P-0  1-1  j-i+1  kmO 

...  N 1-1  N-i 

~ * l l <1  l [2(s+k)]a*(mth)'l)y/] 
i-1  k-0  s-0 


- "l  i E n uu- 

p-0  i-1  k-0 

, N-l  N 1-1  N-j 

- 2az*[  l l (l  l [2(s+k)+j-i]a*C-*,*J’t'i  )yiyjJ 

i-1  j-i+1  k-0  s-0 

-2(1  a2p)[  l l (l  [2(N+k)-j-i]a*  * Jyj  y •; 

p-0  i-1  j-i+1  k-0 

, . N N i-1  N-j  . . , 

+ *”'1  l (l  T S'-**'*  )yilty,l 

i-1  j-i  k-0  s-0 

N-l  , N N i-1  . . 

*’<l  “ '>  1 E l <1 

p-0  i-1  j-i  k-0 


N-l  N i-1  N-j  v . , 

♦ 1 E <1  l 

i-1  j-i+1  k-0  s-0 
N-l  N-l  N i-1  . . 

♦jfl  E < l 

p-0  i-i  j-i+1  k-0 


N N i-1  N-j  ^ . . 

•-”«[  E E E * VJj 

i— 2 j-i  k-0  s—0 

- 2l"l  2pS')t  l f ,Y 

p-0  i-1  j-i  k-0 


N-l  N i-1  N-j  . . 

- ««  / E l<l  l 

i-2  j-i+1  k-0  s-0 
N-l  N-l  N i-1 

-2(1  2p.*'),l  E (l 

p-0  i-1  j-i+1  k-0 
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N N i-1  N-j 

+ 2a2u*1[\  III  l [2(s+k)+j-i]a 

i~l  jmi  km0  sm0 


»(••*)  + j-l-i 


N-l  N N i-1 

+ 2(  l a2p**)[  l l ( l [2(N+k) -j-i]a 
pm0  i-1  jmi  k*0 

, N-l  N i-1  N-j 

+ 2a*"*J[  l l l l l t2(*+k)+j-i]a 
i*l  jmi+1  k=0  sm0 

N-l  N-l  N i-1 

+ 2{  l a2*'*)!  I III  [2(N+k)-j-i]a 
pmO  i*l  jmi+1  k~0 

, „ . N i-1  N-i  . . 

_ 2N*1.  r / r r Ks*k)  . 2, 

- 2a  l l ( l l a )yi_1  ] 

i~l  k~0  s*0 


2<H*k).  j 


N-l 


N i-1 


a _ u a a 

- 21  I '*'«>!  Ill  > 

p=0  i*l  k=*0 


i(N-i+k) 


>yi-iZ] 


N-l  N i-1  N-j 

tl  l (l  l 

i*l  jmi+1  k=*0  s=0 


. 2N*2r"v*  p r"  "tj  . 

-4a  [ l l ( l I a 9 ) yj.iyj.il 


N-l  . N-l  N i-1  . , 

-4(1  S'*,  I l III  a 

P“0  i>*l  jmi+1  kmO 

♦ f /iJ  T * a"-‘; ;»i-j a; 

i»2  k=0  s*0 

N-l  „ N i-1  . 

. r - >p»l.  , r > V 2f*/-j*A)  2. 

+ f 1 2pa  )[  l l l a )yj-i  1 

p=0  i**I  k=0 

♦ <»* 2"'J(  i If!  I 

1*1  j=i+l  k=0  s=0 


N-l 


N-l  N i-1 


If  O.  If  if  A * 

+ 2(1  2pa2p*2)ll  I fl  a 


2<N**)-j-i 


pmO  i*l  jmi+1  k*0 

N i-1  N-i 


i yi-i  y j 


- a2^2(  l ( l l [2(s+k)]a 
i*l  k*0  s-0 


i y i-i  y ji 
~1iyi-iyji 
1 i yjy  j-i  i 

yjyj.il 
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- ("I*  a2>**)[  f l2(N-i+k)]*2<’'‘i*k,'i 

yO  iml  km0 

N-l  N i-1  N-j 

- 2aiA/**l  l l (l  l t2(s+k)+j-i]a2<s*A)*i'i'1)yi.iyi.1] 

i*l  j-i+2  Jc-0  sm0 

N-l  N-l  N i-1  ... 

-2(1  a2p*2)ll  l (l  [2(N+k)-j-i]a2f***)~rl~1)  yi.tyj.il 
yO  i~l  j=*i+l  k*0 

C.  24 

Considering  the  32  terms  of  (C.24)  and  combining  them  in  the 
following  groups  - (1,5),  (2,6),  (3,7),  (4,8),  (9,13,17), 
(10,14,18),  (11,15,19),  (12,16,20),  (21,25,29),  (22,26,30), 
(23,27,31),  and  (24,28,32)  gives: 

N i-1  N-i 

Id  l 

i=l  k*0  s=0 
N i-1  N-l 

Id  l 

i**l  k=0  yO 


N-l 


l ( l l 2[N-s-kla2("*a*',''i  lyt* 


-Id  l l2(W-l+k-pla*‘M'M*”-i)yi9 


N-l  N i-1  N-j  „ , . 

+ 2 l l ( l l [2(N-s-k)-j+i]aZ<M*s**)*’‘x’1 
i«2  j=i+l  k=0  s=0 


) Vi  y> 


N-l  N i-1  N-l 

- 2 l l d l 1 2 (N+k-p) -j-i]a*(  A**)’i~i~l)yi  yy 

i«2  j-i+2  Jc-0  yO 

N N i-1  N-j  ... 

-d  id.  I (2(N-s-k)-j+i-i]a*<"^*>*’-l)yi.1yj 

i“l  j=i  kr*0  s=0 


N N i-1  N-l 

* 2 l l ( l l (2 (N+k-p) -j-i+lla*""*'**-’-1) yj.j  yy 
i— 2 j— i Jc=0  yO 

N-l  N i-1  N-j  . , 

- 2 l l (l  l 

i-2  j*i+2  Jc-0  s=0 
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r 


j 

! 


♦ 2 l l *?  Y [2(H+k-p)-j-i+lla*<"*h**,‘i-l)yiyi.i 

i-1  j-i+1  k-0  p-0 

* 2 1 a1  T 

i-1  Jc-0  s-0 

N i-I  N-l  , . 

- 2 l ( l l [N-i+k-pnia*'*'**  r>*2 ) yi-a 

i-2  *-0  p-0 


W-J  N i-1  N-j 

+ 2 l l (l  l 

i-1  j-i+1  k-0  s-0 
N-l  N i-1  N-l 

-*l  l <1  l 

i-1  j-i+1  k-0  p-0 


[2 (N-s-k-1) -j+i]a 


2(N***A)*j  -i*2 


>*-1  y,-i 


[2  (N+k-p+1) -j-i]a2'"***',~i~i*l)yi.1  yy.y 


C.  25 

Combining  terms  3 and  5,  4 and  6,  7 and  11,  8 and  12  of 
(C.25)  gives: 

N i-1  N-i 

I ( l l (2(N-s-k)]a  ^'s*h>'1)yi 

i-1  k-0  s-0 


- I / 1 Y [2(N-i+k-p)]a*f"-"Atf)-1)yi2 
i-1  k-0  p-0 

N i-1  N-i 

*2\(l  l (N-s-k-llat(N4S*,'U1)yi.1i 
i-1  k-0  s-0 

N i-1  N-l  . 

-2l<l  l n-Merm**1"-'—*'**  tti.,' 

i-1  k-0  p=0 

N-l  V N-j  . . . 

~ 2 l I (l  [2(N-s)-j-i]a*fM*9U’*i‘1)yiyj 
i-0  j=i+l  s-0 

N-l  N N-l 

+ 2 l l (l  [2(N-p)-j+i]a2<^f)~3'*i'1  ) yyyy 
i-0  j-i+1  p-0 
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2<N*s)*j+l  - ] 


2(N*»*H)*x 


N i-2  N-l 

+ 2 l ( l l [2  (N+k-p-i+1)  ] a 
i-2  k-0  p-0 


2 l l ( l [2(N-p)-j+iJa*'"**>-’*i-*)yi 
i-l  j-i+1  p=0 


2<M**+A)-X  . 2 


N-l  N N-j 

2 l III  [2(N-s)-j-i]at<"4*)*J*J-1 
i-0  j-i+1  s-0 


2 


2(n*t  + k)  + l 


l ( l [2(N-s) -j-i]a*<”',)+>+i-*)yi.1 


N+&p*l-l 


l (N-p]aZ(»V>~*yi2 


A " 2 l < 1 [N-s-i]a*<"*s*i>'1  )ytJ 


iml  sm0 

N-l  N-l  , 

+ 2 l ( l [N-2p+i]a**P'i~-1)yiyM 
i-0  p-0 

N-l  N-l 

~2l<l  [N-pla^W'iyi* 

i-0  p-0 

+ 2 l (I  [N-k]aZ(***)~*)yI* 
i-1  k-0 

- 2 l l [2N-j-i+Ua )yi  i y. 
i-1  j-i 


C.  28 


Rearranging  and  combining  terms  in  (C.28)  gives  the  desired 
result: 


N-l 


Vj  * 2 l [N~k]a 


2(N*k)-S  Z 


k=0 


+ 2 l ( l [N-2p+i]aN*Z**l'x)  ydyN 
i— 0 p=0 

N-l  N-l  . , . , 

- I l [2N-j-i]ae"*’  + i-*ydy,. 
i*=0  j=0 


C.  29  | 


Proposition  Let  QN  be  defined  by  Equation  (C.20).  Then  QN  can  be 
written  as: 

N N N 


Qn=  2 l l 1 (i-t)  a2**1**'1  y-  yj 

t=0  i=0  j=0 


C.  30 


Proof : 


By  induction  using  (C.23) , the  expanded  version  of  (C.20) , 
Qt  - 2ayt  2 + 2(l+az)  y0yx  - 4azy0yi  - 2a(l+az)y02  + 2a3 yz 
* 2y0ys  + (2y1z-2y0z)a  - 2y-yja2  C.31 
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From  (C.  30)  , 


2 .2 


ox  - ^ ^ ♦ *yx  - «y*  - * y*y,J 


Again  from  (C.30) , 


N-l  N-l  N-l 

QN  ‘ 2 I l l U-t) a** 

t-0  i-0  >0 


sfty> 


Assume  (C.30)  true  for  W-i.  Then  by  the  Lemma, 


0 + & 

n-i  “a/-j 


N-l  N-l  N-l 

2 l l l yy 

t-0  i-0  j-0 


N-l 

+ 2 l (N-k)a 
k-0 


2 (***)•  i 2 


N~1  N 

+ 2 l l (N-2p+i)a*J*Zr*l~*  y 
i-0  p-0 


A/-i  JV-i 

- I l (2N-j-i)az"*’*J-Jyly; 
i-0  j-0 


N-l  N-l  N-l 

- 2 l l l d-tia1**^'1^^ 

t-0  i-0  j-0 


N N N 

*2  l l l „ 

t-0  i-N  j-N 


yiyi 


N N-I  N 

+ 2 l l l 

t-0  i-0  j-N 


N N N-l 

* 2 I I I riW*1-''-***, 

t-0  i-N  j-0 


N N-l  N-l 


2 l l l (i-t)a 


t-N  i-0  j-0 


yj-y/ 


N N N 


•2  l l l (i- t)m”+**** 


t-0  i-0  j-0 


yjyj 
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C.  32 


C.  33 


C . 34 


C . 35 


C.36  | 


„ * 

i 
' I 
, 


Another  version  of  On  (Equation  3.72)  was  developed.  It  is 
written  in  the  standard  form  of  a polynomial  and  as  such  was  to  have 
been  used  with  known  polynomial  theory  to  yield  information  on  root 
locations.  The  form  of  the  coefficients  of  the  polynomial,  being 
quadratic  forms  in  the  measurements,  is  interesting.  However,  0N 
as  expressed  by  (3.73)  (or  (C.30))  has  more  practical  value  computation- 
ally. 

A proof  by  induction  is  outlined  for  this  second  version: 

N 


2(211-1) 

e„-  Jo  J0  ■"“«'** 

where : 


C . 37 


y„T  - (y0,-.-,yN) 

Sr  - shift  matrix 
J « unit  Hankel  matrix 
2 1 

0X  “ l t yx'(  l (k+l-4i)s.  J)yA)ak 
k-0  i-0  " 

- 2y,yt  + 2y,Za  - 2yJ  a - 2y,yJa* 

Assume  (C.37)  true  for  N-l.  Then 


C.  38 


'H-1 


2 ( 2N-3)  N-l 

I 1 l la‘ 

k-0  imO 


2 (2N-3) 


k-0 


N-l 

l 

i-0 


l IvS'f  l C.  39 


where 


y„*T  - (y0>yj 


/•  • • /y^j 


Consider 


Working  with  the  first  term  of 
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) 2 


2 l (N-k)a  * 


- I (4N-k-l)a*y  * 
k~2N-l 
(k  odd; 


4N-3  N-l 

- I (4N-k-l)  ( l SH  J)aky„* 

k-2N-l  i-0  1 21 

2 (2N-1)  N 

- I Uor \yj(  l <k+i-4i>s  j) 

k=0  i=0 

Working  with  the  second  term  of  &Nmti 


■—  }a* 

y* 


W-i  w 

2 l ( l (N+i~2j) 
i=0  j=0 


a ; yj  y* 


W-i  j+JW-i 

Z t l (2N+2j-k-l)  a*  yj  y J 
j-0  k=j+N-l 


where 


* = j+N-l,j+N+l,j+N+3,... 

4N-2  N 

! Jo  "’"•‘V'  J„  « 


Working  with  the  third  term  of  : 
W-l  W-2 

- J J 

p=0  r=0 


o 


2N-1 

l tyMr(k-l-2N)S  Jyja 


iN+k-Z 


4N-2 

J •%*'** 


Combining  (C.39)  and  (C,42)  gives 


4N-2  N 

l %•'(  I (k*i-4i,s 
k-0  i-0 

Combining  (C.43),  (C.41),  and  (C.42)  gives 


C.43 


* T V<  j.  I**1-41’3*.,.™  jv* 


A-0  i-0 


- 0. 


C.44 


Returning  to  V.,  (Equation  (C.19)),  and  using  (C.16)  and  (C.8), , 
N 

v»  - 1*1 1 

N N 


- -2a2  l l gS**'*'-1 

p-0  q-0 

can  be  expressed  in  standard  polynomial  form  though  in  a 
somewhat  awkward  manner, 

N N+i 

l l 

i-0  k-i 
N k 

l l 

k-1  i-0 


C.45 


V m ~2°2  l l (*-*)* 


2N  N , 

- -2a2 1 l l (k-i)a**‘J  * I l ( k-i)  a *A'J  7 

k-N+1  i-k-N 


2N  k N 

m-2o2  l ( l t + j t]a 

k-1  t—0  t-k-N 

(k<N)  (k>N) 


a*-2 


C.46 
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aN  - hx»%  v * s* 


D.  5 


where: 


<X\ 


S'' 
\ \ ' 


v ' 

\ ' 

\V\ 

•V'a'l 


From  (D.3)  and  (0.5),  the  likelihood  function  is  given  as, 

£ - p( va 

- (3vfi  |*,|'^  exp{-||  } 

where , 

Rt  » 

■ M ™] 

b.  The  determinant  of  R, 


Rr  ■ $t » o2J  + h262Mr 
|R,  | - | *[o2*~l  (tT)~l  + h262x;*T  | 


- |o2$~1(*r; -1  + h2e2x| 


Since, 


-a  x 

oX'X 


'-a 'I 


then. 


*-l  -l  , 


1 D 

-a  1+a2 

• "•  i,.:. 


Let  the  N x N matrices  4*,,  and  A M be  defined  as, 
¥*  = $ "l  (4>T>  _1 
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Using  Equation  (B.17) 


form  given  by  (D.9) , i.e.r  L 


may  be  expressed  as: 

- (a2)*  |v2i  + fj 

- (o2)*  l(l+\>2)  |v2j  + A 


v2  = h202/o2 


Then  (D.15)  may  be  rewritten  as 


and 


\*y\  “ 


c.  The  inverse  of  R, 


Prom  Equation  (D.9) 


-1 


(«r)  ^R-1*-1 


where, 

R'1  - (rjf1)  « [oz9~-1  (9T)-1  + hz&zI] 

By  the  same  process  that  led  to  Equation  (B.19) , 
-.-l  = 


where. 


(o*a)'  ( Li.1)(JN.j)/(l , j > i 


L#  t I 


rii  * = ry/-* 

d.  The  likelihood  equation 

Forming  L = o from  (D.6)  gives, 

aa 


“ 2[hx»(-^§J  + 4>)uN]TRy  ~1}(y„-hxe§M-hb*u„) 


I T n -1 


D.  18 


D.  19 


D.  20 


D.  21 


D.  22 


D.3  xc  UNKNOWN  PARAMETER 

The  likelihood  equation  for  a is  given  by  Equation  (D.22)  with 
xQ  replaced  by  5?0.  Prom  (D.2)  , 


y« 

1 

/ 

’ ' 

1. 

yi 

a 

u# 

to 

Hi 

y2 

a* 

Cl 

e 

m hx0 

• 

+ hb*0 

• 

+ 

• 

+ 

• 

e 

• 

• 

• 

• 

• 

• 

• 

• 

• 

- % . 

a* 

. UH-1  . 

. CM-j, 

. . 

where : 


0_ 

* 


D.  23 
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Considering  the  set  of  random  variables  (r\,  ,nt  , . . . ,r\M  ,Z0, . . . ) , 

the  likelihood  function  can  be  written  as: 

N* 1 1 i 2 

L-  (2v)  2 (a2)  *exp{-  j [\ a* -hb*«J  |R-i 

+ (o2)  ~l(yc-hxt)2]}  D.  24 

Forming  |^—  gives , 

(yH-hx»aN-hb*liN)rRy~1aN  + (a2)'1  (y,-hx,)  - 0 0.25 

or,  *t  - [ (yt/-hb<i\itl)TRy-l§Na2+y0]/[h(l+a„'rRy-1aNo2)]  D.26 

D.4  x„  UNKNOWN  RANDOM  VARIABLE 
o 

a.  The  likelihood  function 
Assume:  x„  ^ Tj  (20,z2) 

and  x0  independent  of  (5^}  and  (nj) 

By  independence, 

(*$  /I*  /ty  ,•  • • /H*/ / • • • ^ T)  (&e  *'Rxr\^  D.27 

where : 


L ■ p( yo'Vi  f',yNfa) 


Note  that 


Then 


Let  Lt  be  the  determinant  of  the  t * t matrix  having  the  form 
of  Equation  (D.33).  Expressing  in  terms  of  (D.ll)  and 

expanding  the  t * t matrix  gives  (noting  that  1 4^  | * 1) : 

[h2e2+o2  ah2e2  0 q 

ah2e2  a2h2e2+S2+cr2  -ao? 

0 -ao2  B2+o2d+a27  ''' v. 


'*■»  2 

-aaz 


o 


-ao‘ 


B2+o2 (1+a2) 


D.  34 


or. 


Lfc  - (he2+o2H(h2e2a2+B2+o2H|B2I+o2At_2|; 

- a2a4r|B2i+o2At_3|;;  -h4e4a2(|B2I+o2At_2|; 

= [h2e2( &2+o2)+o2  (h2e2a2+B2+o27  ] (|  B2J+a2At_2l^ 

- a2a4(|B2I+o2At_j|;  D. 35 

Using  (B.17), 

Ii£  * ( (o2+Ji2e2j  (B2+o2)+o2Ji2e2a2J Jfc_2  ~ a2a**Jt-3  ~ ^ D.36 

where : 

Lj  = o2  + h2e2 

L2  * (o2+h2e2^ (a2+B27  + h2e2a2o2 

Lj  * (o2+h2e27 f(o2+B27  +o2B 2a27  + h2eza2a2 [&2+o2  (1+a2) ] 

TT  7B2  + a2  (1+a2)  - 2aa2 cos — “7 
' p+i 


*=2 


Thus, 

|r 


= L. 


y1  ~N*t 
c.  The  inverse  of  R, 


D.  37 


From  (D.33) 
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D.  38 


r 


Ry  ~l  » 

where, 

R“l  - (2 if1)  - fh2e2S  + o2*,  *l  (*,  ')  "I  + 


D.39 


Following  the  same  procedure  to  find  (D.21)  and  (B.19) , 

i-i  - 


tjf1  - ljjl  - (a2  a)  (Lj.j)  , j i i D.40 

where : 

r(h2e2a2+&2Hj2)JN,1  - a2a4j„_2  , j - 1 


H+t-j 

J0  m 1 

L0  » I 

d.  The  likelihood  equation 


, 2 S j S N+l 


From  (D.  29)  , forming  - w ^ gives, 

QcL 


l«y  I ^ + Uli „-*****-&*,¥„)*(-£  Ry"1) 


- 2 [hX,(-£  aM)  + hb  (-£*,)  yM]  TRy  _1 } (yM  -hX,  aN -AM, 


0.41 


D.5  THE  DIFFERENCING  APPROACH 

a.  The  likelihood  function 

From  (D.l) , the  equivalent  model  for  this  scheme  is 
yi* j m a yj  + hbui  + -anjJ  i * 0,1,.. 


D.  42 


where:  yQ  is  a known  constant 


Stacking  (D.42) 
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y,  -*y# 

».  1 

Co 

y*-*yx 

• 

• 

m hb 

«» 

• 

• 

♦ A 

• 

• 

♦ 

c* 

e 

• 

D.43 

• 

. y»-ay*.,. 

• 

, Uw-J  , 

• 

w 

• 

. tw-|. 

l 

or  with  the  obvious  definitions 


Then  from  (B.17)  and  (B.19) 


p 

m Tf  [h2B2+o2 (1+a2) -2ao2  cos  ^7-  J 
k~l  P+1 

V1  - <**?> 

where : 

IijL  ~ (v2*)  71  (3i-i  ) (jH.j)/(j„)  , j > i 

j0  - 1 

l*yl  " 3* 

c.  The  likelihood  equation 

From  (D.48)  forming  ~~^L  = 0 gives, 

l*yl  ~lrdfl^yl;  + 

" 2(\*)  yRy  -1  > 

= 0 


D.  50 

D.  51 

D.  52 

D.  53 


D.  54 
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APPENDIX  E 


RECURSIONS  FOR  FIXED  POINT  CURVE  FITTING 


E.l  x_  KNOWN 


From  Equation  (3.9) , the  derivative  function  Djf(a)  at  some  point 
can  be  expressed  as: 


+hb(V 


The  recursions  are  given  below  for  n * 1 


Dn  - *>n-l  ♦ (yn-An-hbV  (**n-\+t*(Vn-Un)) 


where  U. 


E.2  V0  UNKNOWN  PARAMETER 


This  cue  is  similar  to  the  previous  except  xQ  becomes  xQjt  which 
changes  with  each  new  sample.  Rearranging  (E.l)  gives  DN(a)  at  seme 
point  a ^ for  this  case. 


m DH  + (DN  ~ °N  x°n 


where : 


N 

ojf  • b l Ky^hbu],)  (1%  - u\)J 

. W 1 l-l 

Dn  m 1 l(ni-hboi)  (la  )J 


N 


ot  - hb  l [a*  (U2.-lA)  ] 
N i-i  1 1 1 


6.11 


E.  12 


E.13 


6.14 


DS  " h l Ul 

i-l 


From  Equation  (3.20) 


xoH  ’ VZ« 

where: 


? i ..  r c 21 -j 

*N  * l y±a\  -**  l l ax  Uj 
1-0  a i-l  >1  J 


N i 


N 


ZN  " h l *1 
1-0 


21 


The  recursions  for  n > 1 are  given  below. 


*n  ' Vn-i 

where  a\  - 1 
o 

*n  ■ Vl  * »A  - “M 

where  jr  - y 
o 


E.  15 


E.  16 


E.17 


E.18 


E.19 


E.  20 
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B .21 


Z„  - Z_  , ♦ 
n n-i 


where  Z 

p1  . p1  • 

n n-1 


where  D 


1 


n-1 


where  P 


3 3 

D-D  , 
n n-1 


where  D 


4 4 

D --D  , 

n n-1 


where  p 


1 


"V2 

- h 

b(y  -hbu1) (U2~UJ 
n n n n 

- o 

(y-hbUJ  fnA*  J 
n n n-i 

■ y,  - hjbu 
1 o 

12  3 

hbA  (U  -U  ) 
n n n 

- 0 


E. 3 x UNKNOWN  RANDOM  VARIABLE 
O 

From  Equation  (3.45) , the  derivative  function  D (a)  at 

N 

aj  can  be  expressed  as: 

DH(a x)  - { lo2  (o2+h2e2Cff)  + A2j?02a2¥7c£ 

* C*vl2C^hXo<32+C^hb(o2+h2t2Clv)  + cfat2^ 

- 2C^hibt2C2  + C^hb(o2+h2t2C^] 

+ C^cJjhb(a2+h2E2C^-C®(o2+h2e2cJ; 

- h*oo2y- «2J*0cJ; 

- C^h3be2(Y+C*72 ] 

+ C^/C^C^h4Jb2e2-C^h2b2  (o2+h2c2C^) 

- 2h2g0bo2C2N] 

+ Cff[h2gobo2 (V+Cp) ] 

* CN  t*2*0b°2  (f+CN}  ~CNI,2b2  <°2+h2*2cl)  J 
+ clth'*b2c2('l+clN))} 


where : 


E.22 

£.23 

E.  24 

E.  25 

seme  point 


E.26 
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E.2 

E.2 

E.2 

E.3 

E.3 

E.3 

E.3 

E.3 


9 " 12  3 

c»  ' JE,  - VJ 


* “ && 

12  3 

The  recursions  for  C , C . and  C are  given  by  (E.21),  (E.25), 

Pi  N Pi 

and  (E. 24) , respectively.  The  recursions  for  the  resiainlng  terms  are 
given  below  for  n 2 1. 


C - C , + A u 
n n- 1 n n 

Cn  " Cn  1 + nAn  1 V\ 

n n-l  n-l  n 

6 6 1 

C m C . + A*u 
n n-l  nyn 


'C1  ' aluo 
'cl  - uo 

4 mVo+  al»l 


Cn  * cn-l  ♦ W-l»n  4 * »l 


.8  _8 


c„  * cn-l  * * V 'cl  * 0 

9 9 a o 3 9 

Cn  * c„-l  * ®>.  * V 'C1  * « 
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